Multivariate 

t Distributions 
and Their 
Applications 


Samuel Kotz 
Saralees Nadarajah 


Multivariate t Distributions and Their Applications 


Almost all of the results available in the literature on multivariate t 
distributions published in the last 50 years are now collected together in 
this comprehensive volume. Because these distributions are becoming 
more prominent in many applications, this book is a must for any serious 
researcher or consultant working in multivariate analysis and statistical 
distributions. Much of this material has never appeared in book form. 

The first part of the book emphasizes theoretical results of a proba- 
bilistic nature. In the second part of the book, these are supplemented by 
a variety of statistical aspects. Various generalizations and applications 
are dealt with in the final chapters. The material on estimation and 
regression models is of special value for practitioners in statistics and 
economics. A comprehensive bibliography of more than 350 references is 
included. 


Samuel Kotz is Professor and Senior Research Scholar in the Depart- 
ment of Engineering Management and Systems Engineering at George 
Washington University. In addition to holding many distinguished vis- 
iting positions at prominent universities, he has authored a number 
of books and more than 100 articles, is Editor-in-Chief and founder 
of the Encyclopedia of Statistical Sciences, and holds three honorary 
doctorates. 


Saralees Nadarajah is Professor of Mathematics at the University of 
South Florida. He has made important contributions to distribution the- 
ory, extreme value theory and its applications in environmental model- 
ing, branching processes, and sampling theory. He has authored more 
than 80 papers and two books. 


Multivariate t Distributions 
and Their Applications 


Samuel Kotz 
George Washington University 


Saralees Nadarajah 
University of South Florida 


SE UNIVERSITY PRESS 


PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE 
The Pitt Building, Trumpington Street, Cambridge, United Kingdom 


CAMBRIDGE UNIVERSITY PRESS 

The Edinburgh Building, Cambridge CB2 2RU, UK 

40 West 20th Street, New York, NY 10011-4211, USA 

477 Williamstown Road, Port Melbourne, VIC 3207, Australia 
Ruiz de Alarcón 13, 28014 Madrid, Spain 

Dock House, The Waterfront, Cape Town 8001, South Africa 


http: //www.cambridge.org 
© Samuel Kotz and Saralees Nadarajah 2004 


This book is in copyright. Subject to statutory exception 

and to the provisions of relevant collective licensing agreements, 
no reproduction of any part may take place without 

the written permission of Cambridge University Press. 


First published 2004 

Printed in the United States of America 

Typeface Computer Modern 10/13 pt. System IATEX 2s [AU] 
A catalog record for this book is available from the British Library. 


Library of Congress Cataloging in Publication Data 


Kotz, Samuel. 
Multivariate t distributions and their applications / Samuel Kotz, 
Saralees Nadarajah. 
p. cm. 
Includes bibliographical references and index. 
ISBN 0-521-82654-3 
1. Multivariate analysis. 2. Distribution (Probability theory) I. Nadarajah, 
Saralees. 11. Title. 
QA278.K635 2004 
519.5'35-dce21 2003055353 


ISBN 0 521 82654 3 hardback 


Contents 


List of Illustrations page ix 
Preface xi 


1 Introduction 1 
1.1 Definition 1 
1.2 Representations 2 
1.3 Characterizations 7 
1.4 A Closure Property 8 
1.5 A Consistency Property 9 


1.6 Density Expansions 9 
1.7 Moments 10 
1.8 Maximums 12 
1.9 Distribution of a Linear Function 15 
1.10 Marginal Distributions 15 
1.11 Conditional Distributions 16 
1.12 Quadratic Forms 19 
1.13 F Matrix 20 
1.14 Association 20 
1.15 Entropy 21 
1.16 Kullback-Leibler Number 23 
1.17 Rényi Information 26 
1.18 Identities 30 
1.19 Some Special Cases 33 
2 The Characteristic Function 36 
2.1 Sutradhar’s Approach 36 
2.2 Joarder and Ali’s Approach 38 
2.3 Lévy Representation 41 
3 Linear Combinations, Products, and Ratios 44 
3.1 Linear Combinations 44 
3.2 Products 52 
3.3 Ratios 56 


vi Contents 


4 Bivariate Generalizations and Related Distributions 
4.1 Owen’s Noncentral Bivariate t Distribution 
4.2 Siddiqui’s Noncentral Bivariate t Distribution 
4.3 Patil and Liao’s Noncentral Bivariate t Distribution 
4.4 Krishnan’s Noncentral Bivariate t Distribution 
4.5 Krishnan’s Doubly Noncentral Bivariate t Distribution 
4.6 Bulgren et al.’s Bivariate t Distribution 
4.7 Siotani’s Noncentral Bivariate t Distribution 
4.8 Tiku and Kambo’s Bivariate t Distribution 
4.9 Conditionally Specified Bivariate t Distribution 
4.10 Jones’ Bivariate t Distribution 


5 Multivariate Generalizations and Related Distributions 
5.1 Kshirsagar’s Noncentral Multivariate t Distribution 
5.2 Miller’s Noncentral Multivariate t Distribution 
5.3 Stepwise Multivariate t Distribution 
5.4 Siotani’s Noncentral Multivariate t Distribution 
5.5 Arellano-Valle and Bolfarine’s Generalized t Distribution 
5.6 Fang et al.’s Asymmetric Multivariate t Distribution 
5.7 Gupta’s Skewed Multivariate t Distribution 
5.8 Sahu et al.’s Skewed Multivariate ¢ Distribution 
5.9 Azzalini and Capitanio’s Skewed Multivariate t Distribution 
5.10 Jones’ Skewed Multivariate t Distribution 
5.11 Matrix-Variate t Distribution 
5.12 Complex Multivariate ¢ Distribution 
5.13 Steyn’s Nonnormal Distributions 
5.14 Inverted Dirichlet Distribution 


6 Probability Integrals 
6.1 Dunnett and Sobel’s Probability Integrals 
6.2 Gupta and Sobel’s Probability Integrals 
6.3 John’s Probability Integrals 
6.4 Amos and Bulgren’s Probability Integrals 
6.5 Steffens’ Noncentral Probabilities 
6.6 Dutt’s Probability Integrals 
6.7 Amos’ Probability Integral 
6.8 Fujikoshi’s Probability Integrals 
6.9 Probabilities of Cone 
6.10 Probabilities of Convex Polyhedra 
6.11 Probabilities of Linear Inequalities 
6.12 Maximum Probability Content 
6.13 Monte Carlo Evaluation 


7 Probability Inequalities 
7.1 Dunnett and Sobel’s Probability Inequalities 
7.2 Dunn’s Probability Inequalities 


63 
63 
66 
68 
69 
71 
72 
73 
74 
76 
80 


87 
87 
90 
90 
93 
94 
97 
98 
102 
103 
105 
112 
119 
120 
126 


127 
127 
131 
135 
137 
139 
140 
143 
144 
145 
148 
158 
160 
161 


165 
165 
169 


Contents vii 


7.3 Halperin’s Probability Inequalities 170 
7.4 Siddk’s Probability Inequalities 171 
7.5 Tong’s Probability Inequalities 172 

8 Percentage Points 174 
8.1 Dunnett and Sobel’s Percentage Points 174 
8.2 Krishnaiah and Armitage’s Percentage Points 175 
8.3 Gupta et al.’s Percentage Points 176 
8.4 Rausch and Horn’s Percentage Points 176 
8.5 Hahn and Hendrickson’s Percentage Points 177 
8.6 Siotani’s Percentage Points 177 
8.7 Graybill and Bowden’s Percentage Points 178 
8.8 Pillai and Ramachandran’s Percentage Points 180 
8.9 Dunnett’s Percentage Points 180 
8.10 Gupta and Sobel’s Percentage Points 181 
8.11 Chen’s Percentage Points 182 
8.12 Bowden and Graybill’s Percentage Points 183 
8.13 Dunnett and Tamhane’s Percentage Points 183 
8.14 Kwong and Liu’s Percentage Points 187 
8.15 Other Results 187 

9 Sampling Distributions 191 
9.1 Wishart Matrix 191 
9.2 Multivariate t Statistic 198 
9.3 Hotelling’s T? Statistic 199 
9.4 Entropy and Kullback-Leibler Number 204 
10 Estimation 207 
10.1 Tiku and Kambo’s Estimation Procedure 207 
10.2 ML Estimation via EM Algorithm 210 
10.3 Missing Data Imputation 212 
10.4 Laplacian T-Approximation 214 
10.5 Sutradhar’s Score Test 215 
10.6 Multivariate t Model 219 
10.7 Generalized Multivariate t Model 222 
10.8 Simulation 223 
11 Regression Models 228 
11.1 Classical Linear Model 228 
11.2 Bayesian Linear Models 233 
11.3 Indexed Linear Models 235 
11.4 General Linear Model 237 
11.5 Nonlinear Models 239 
12 Applications 241 
12.1 Projection Pursuit 241 


12.2 Portfolio Optimization 243 


viii Contents 


12.3 Discriminant and Cluster Analysis 
12.4 Multiple Decision Problems 
12.5 Other Applications 


References 
Index 


244 
245 
246 


247 
269 


1.1 


1.2 


1.3 


1.4 


1.5 


1.6 


3.1 


4.1 


4.2 


d.1 


5.2 


5.3 


5.4 


List of Illustrations 


[3 


Joint contours of (1.1) with degrees of freedom v = 1, zero 


means, and correlation coefficient p = 0.8, 0.6,..., —0.6, —0.8 page 3 


Joint contours of (1.1) with degrees of freedom v = 2, zero 


means, and correlation coefficient p = 0.8,0.6,...,—0.6,-0.8 4 


Joint contours of (1.1) with degrees of freedom v = 10, zero 


means, and correlation coefficient p = 0.8,0.6,...,-0.6,-0.8 5 


Joint contours of (1.1) with degrees of freedom v = 30, zero 


means, and correlation coefficient p = 0.8,0.6,...,-0.6,-0.8 6 


Mutual information, (1.31), for p = 2 

Mutual information, (1.31), for p = 4 

Densities of the ¢-ratio distribution (3.11) for (mz, my, v) = 
(0,0, 1), (0,0, 30), (1,3,1), (1,3, 30), (3,1,1), (3,1, 30), 
(3, 3,1), and (3, 3,30) 

Jones’ bivariate skew t pdf (4.33) for (a) vı = 2 and v2 = 3; 
and (b) vı = 2 and v = 20 

Jones’ bivariate skew t pdf (4.40) for (a) a = 1, b= 1, and 
c=1; (b) a = 3, b = 4, and c=5; (c) a= 5, b = 1, and 
c= l; and (d) a = 1, b = 5, and c = 1 

Fang et al.’s asymmetric t pdf (5.16) in the bivariate case 
(a) m = 2, mı = 10, m = 10, and ry2 = 0; (b) m = 2, 
mı = 10, mz = 2, and rig = 0; (c) m = 2, mı = 10, 
mz = 10, dnd r12 = 0.5; and (d) m = 2, mı = 10, m2 = 10, 
and T12 = 0.9 

Jones’ skewed multivariate t pdf (5.28) for p = 2 and (a) 
a = 6, v = 3, and c = 2; and (b) a = 2, v = 3, and c = 6 
Jones’ skewed multivariate t pdf (5.29) for p = 2 and (a) 
v = l; and (b) v = 20 

Jones’ skewed multivariate t pdf (5.32) for p = 2 and (a) 
Vo =A; Vy =4, and V2 = 4; (b) Vo =2, Vi = 20, and V3 = l; 
(c) v = 2, vı = 1, and v = 20; and (d) vo =m = n =2 


26 
27 


58 


83 


85 


99 


107 


108 


112 


5.5 


6.1 


List of Illustrations 


Steyn’s bivariate pdf corresponding to (5.46) for t3 = 0 

and (a) kı = 0.8, k2 = —0.4, and riz = 0.2; (b) kı = 0.8, 

k2 = —0.4, and r12 = 0.8; (c) kı = —0.4, k2 = 0.8, and 

ry2 = 0.2; and (d) kı = —0.4, k2 = 0.8, and riz = 0.8 124 
The sets A,(c) and E(c) N {|| z ||= r} in two dimensions 146 


Preface 


Multivariate t distributions have attracted somewhat limited attention of 
researchers for the last 70 years in spite of their increasing importance in 
classical as well as in Bayesian statistical modeling. These distributions 
have been perhaps unjustly overshadowed ~ during all these years — by 
the multivariate normal distribution. Both the multivariate t and the 
multivariate normal are members of the general family of elliptically 
symmetric distributions. However, we feel that it is desirable to focus 
on these distributions separately for several reasons: 


e Multivariate t distributions are generalizations of the classical univari- 
ate Student ¢ distribution, which is of central importance in statistical 
inference. The possible structures are numerous, and each one pos- 
sesses special characteristics as far as potential and current applica- 
tions are concerned. 

e Application of multivariate ¢ distributions is a very promising ap- 
proach in multivariate analysis. Classical multivariate analysis is 
soundly and rigidly tilted toward the multivariate normal distribu- 
tion while multivariate t distributions offer a more viable alternative 
with respect to real-world data, particularly because its tails are more 
realistic. We have seen recently some unexpected applications in novel 
areas such as cluster analysis, discriminant analysis, multiple regres- 
sion, robust projection indices, and missing data imputation. 

ə Multivariate t distributions for the past 20 to 30 years have played a 
crucial role in Bayesian analysis of multivariate data. They serve by 
now as the most popular prior distribution (because elicitation of prior 
information in various physical, engineering, and financial phenomena 
is closely associated with multivariate ¢ distributions) and generate 
meaningful posterior distributions. This diversity and the apparent 


xi 


xii Preface 


ease of applications require careful analysis of the properties of the 
distribution in order to avoid pitfalls and misrepresentation. 


The compilation of this book was a somewhat daunting task (as our 
Contents indicates). Indeed, the scope of the multivariate t distribu- 
tions is unsurpassed, and, although there are books dealing with multi- 
variate continuous distributions and review articles in the Encyclopedia 
of Statistical Sciences and Biostatistics, the material presented in these 
sources is quite limited. 

Our goal was to collect and present in an organized and user-friendly 
manner all of the relevant information available in the literature worthy 
of publication. It is our hope that the readers — both novices and experts 
— will find the book useful. Our thanks are due to numerous authors who 
generously supplied us with their contributions and to Lauren Cowles, 
Elise Oranges and Lara Zoble at Cambridge University Press for their 
guidance. We also wish to thank Anusha Thiyagarajah for help with 
editing. 


Samuel Kotz 
Saralees Nadarajah 


1 


Introduction 


1.1 Definition 


There exist quite a few forms of multivariate ¢ distributions, which will 
be discussed in subsequent chapters. In this chapter, however, we shall 
describe the most common and natural form. It directly generalizes the 
univariate Student’s ¢ distribution in the same manner that the multi- 
variate normal distribution generalizes the univariate normal distribu- 
tion. 

A p-dimensional random vector X = (X1,...,Xp)* is said to have the 
p-variate t distribution with degrees of freedom v, mean vector p, and 
correlation matrix R (and with © denoting the corresponding covariance 
matrix) if its joint probability density function (pdf) is given by 


—(v+p)/2 
= Na alt tage Ro Gai) si 


IO = aero Y 


(1.1) 


The degrees of freedom parameter v is also referred to as the shape pa- 
rameter, because the peakedness of (1.1) may be diminished, preserved, 
or increased by varying v (see Section 1.4). The distribution is said to 
be central if p = 0; otherwise, it is said to be noncentral. 

Note that if p = 1, w = 0, and R = 1, then (1.1) is the pdf of the 
univariate Student’s t distribution with degrees of freedom v. These 
univariate marginals have increasingly heavy tails as v decreases toward 
unity. With or without moments, the marginals become successively less 
peaked about 0€ Ras v } 1. 

If p = 2, then (1.1) is a slight modification of the bivariate surface of 
Pearson (1923). If v = 1, then (1.1) is the p-variate Cauchy distribution. 
If (v +p)/2 = m, an integer, then (1.1) is the p-variate Pearson type VII 


1 


2 Introduction 


distribution. The limiting form of (1.1) as v — oo is the joint pdf of the 
p-variate normal distribution with mean vector p and covariance matrix 
X. Hence, (1.1) can be viewed as an approximation of the multivariate 
normal distribution. The particular case of (1.1) for» = 0 and R = I 
is a mixture of the normal density with zero means and covariance ma- 
trix vI, — in the scale parameter v. The class of elliptically contoured 
distributions (see, for example, Fang et al., 1990) contain (1.1) as a 
particular case. Also (1.1) has the attractive property of being Schur- 
concave when elements of R satisfy ri; = p, i # j (see Marshall and 
Olkin, 1974). Namely, if a and b are two p-variate vectors with compo- 
nents ordered to achieve a1 > az > --- > ap and by > bp > --+ > bp, and 
if this ordering implies )>*_, a; < Y$ bi for k = 1,2,...,p— 1 and 
$21 ai < SOP, bi, then (1.1) satisfies f(a) > f(b). 

In Bayesian analyses, (1.1) arises as: (1) the posterior distribution of 
the mean of a multivariate normal distribution (Geisser and Cornfield, 
1963; see also Stone, 1964); (2) the marginal posterior distribution of 
the regression coefficient vector of the traditional multivariate regres- 
sion model (Tiao and Zellner, 1964); (3) the marginal prior distribution 
of the mean of a multinormal process (Ando and Kaufman, 1965); (4) 
the marginal posterior distribution of the mean and the predictive dis- 
tribution of a future observation of the multivariate normal structural 
model (Fraser and Haq, 1969); (5) an approximation to posterior dis- 
tributions arising in location-scale regression models (Sweeting, 1984, 
1987); and (6) the prior distribution for set estimation of a multivariate 
normal mean (DasGupta et al., 1995). Additional applications of (1.1) 
can be seen in the numerous books dealing with the Bayesian aspects of 
multivariate analysis. 


1.2 Representations 
If X has the p-variate t distribution with degrees of freedom v, mean 


vector #4, and correlation matrix R, then it can be represented as 


e If Y is a p-variate normal random vector with mean 0 and covariance 
matrix ©, and if vS?/o? is the chi-squared random variable with 
degrees of freedom v, independent of Y, then 


X = SHY +p. (1.2) 


This implies that X | S = s has the p-variate normal distribution with 
mean vector p and covariance matrix (1/s?)®. 


1.2 Representations 3 


thre Sst eel 6 nus holdat 


=S =m =F 
-> -2 ~1 


iiv] hedei 


-3 ~2 -n 


heddi hodat hedini 


Fig. 1.1. Joint contours of (1.1) with degrees of freedom v = 1, zero means, 
and correlation coefficient p = 0.8,0.6,..., —0.6, —0.8 


4 Introduction 


rhoed Suz thot Bu? hodin? 


-3 =—-2 —1 
e -2 1 
-ə -2 <1 


-i -2 -1 


hoedd? heds? heden? 


Fig. 1.2. Joint contours of (1.1) with degrees of freedom v = 2, zero means 
and correlation coefficient p = 0.8, 0.6,..., —0.6, -0.8 


1.2 Representations 5 


hodine theah Spel todit 


-3 -2 —1 


hodni tod A l hetn 


thoe.A uct hetsa hoeddi 


Fig. 1.3. Joint contours of (1.1) with degrees of freedom v = 10, zero means, 
and correlation coefficient p = 0.8, 0.6, ..., —0.6, —0.8 


6 Introduction 


tho Bett tho Sed thea dnc’ 


hodn thee ued hA 2nd 


-5 -2 -1 


thee 140%) rhe ni thee BUX) 


Fig. 1.4. Joint contours of (1.1) with degrees of freedom v = 30, zero means, 
and correlation coefficient p = 0.8,0.6,..., —0.6, —0.8 


1.8 Characterizations 7 


e If V!/2 is the symmetric square root of V, that is, 

VIV = V~ W, (R, v+p-1), (1.3) 
where W(X, n) denotes the p-variate Wishart distribution with de- 
grees of freedom n and covariance matrix X, and if Y has the p-variate 
normal distribution with zero means and covariance matrix vI, (I, is 
the p-dimensional identity matrix), independent of V, then 

-1 
X = (vi) Ytp (1.4) 


(Ando and Kaufman, 1965). This implies that X | V has the p-variate 
normal distribution with mean vector yz and covariance matrix vV~. 


1.3 Characterizations 
From representation (1.2) it easily follows for any a Æ 0 that X has the 
joint pdf (1.1) if and only if 
X |S? =$ ~N (p,s~75) 
e (aT £a) 1” a’ (X - u) | S? =s? ~ N (0,87?) 
Se (a7 Za) ena oe (X-p)~to, 
and this is one of the earliest characterization results given in Cornish 
(1962). This result can also be obtained by using the representation 
(1.4): X has the joint pdf (1.1) if and only if 
X|V~\N(p,vV~) 
e (aT £a)" a’ (X-u) |V~N (0, v (a? V~'a) / (a? £a)) 
c (a? Za) er (X - p) ~ ty, 
as noted by Lin (1972). 

Lin (1972) obtained two further characterizations using the represen- 
tation (1.2). Let vS? ~ x? and let X1, X2, ..., Xp be conditionally in- 
dependent continuous random variables symmetrically distributed with 
E(X, | S? = 8) = pe and Var(X, | S? = s?) = o2/s? < œ for 
k = 1,...,p. Then the following characterizations are valid 
e (X1,Xo2,..., Xp)? has the joint pdf (1.1) with mean vector p, covari- 


ance matrix D, and degrees of freedom v if and only if 


2, (Xn — pe)? 
>D eg y Fors 
k=1 Pok 


8 Introduction 


where D is a px p diagonal matrix with its kth diagonal element equal 
to ož. 


e In the special case gł = o? for all k and the conditional pdf of X, | 
9S? = 8? is positive and differentiable for all x € R, (X1, X2,- ., Xp)T 
has the joint pdf (1.1) with zero means, covariance matrix o°Ip, and 
degrees of freedom v if and only if the joint pdf of X1, X2,...,Xp is 
a function of z? +23 +---+22 only. 


1.4 A Closure Property 


Consider Studentizing transformations T : R” — R*, depending on 
matrices A(n x k), B(n x v) and Q(n x n), given by 


ATX 


TX) = Terx] 


(1.5) 


such that A7QB = 0. Jensen (1994) established that the class of mul- 
tivariate t distributions is closed under the transform T(-). Specifically, 
assume ATA = I}, BTB = [,, and X is distributed according to (1.1) 
with zero means, correlation matrix In, and degrees of freedom m. Un- 
der these assumptions, Jensen showed that T'(X) is also distributed ac- 
cording to (1.1) with zero means, correlation matrix I}, and degrees of 
freedom v. 

Jensen (1994) also studied the concentration properties of (1.1) via 
peakedness by varying its parameters. If X is multivariate normal, then 
the transformation X — T(X) diminishes the peakedness. If, on the 
other hand, X is distributed according to (1.1) with mean vector pln, 
covariance matrix o7I,, and degrees of freedom m, then the transfor- 
mation is peakedness-enhancing for all m < v. If m > v > 2, then 
the transformation serves to increase variances. For any m > v > 0 
the marginal distributions are less peaked after T'(X) than before in the 
sense of Birnbaum (1948). If m = v, then the marginals are identical 
before and after T(X), thus exhibiting identical tail behavior. If v > m 
then marginals are more peaked (in the sense of Birnbaum, 1948) after 
applying T(X) than before; and if v > m > 2, then T(X) serves as a 
variance-diminishing transformation. 


1.5 A Consistency Property 9 


1.5 A Consistency Property 


A random vector X = (Xi,...,Xp)" is said to have the spherical dis- 
tribution if its joint pdf can be written in the form 


P 
9 is a o) ; 
i=1 


where g(-) is referred to as the density generator. The p-variate t pdf 
(1.1) with u = 0 and & = I, is spherical because in this case, 


T ((y + p)/2) ( Beene 
(xv)P/2T (v/2) Í 


glu) 


V 


Other examples of spherical distributions include the multivariate nor- 
mal and the multivariate power exponential. A spherical distribution is 
said to possess the consistency property if 

p) (1.6) 


œ p+1 p 
/ g È p) dtp = g È a} 
=00 i=1 i=1 


for any integer p and almost all x € RP. This consistency property 
ensures that any marginal distribution of X also belongs to the same 
spherical family. Kano (1994) provided several necessary and sufficient 
conditions for a spherical distribution to satisfy (1.6). One of the them 
is that g must be a mixture of normal distributions; specifically, there 
exists a random variable Z > 0, unrelated to p, such that, for any p, 


f(ulp) = [(g)" (-=) F(dz), 


where F(-) denotes the cumulative distribution function (cdf) of Z. 
Since the multivariate ¢ is a mixture of normal distributions (see (1.2)), 
it follows that it must have the consistency property. Other distributions 
that have the consistency property include the multivariate normal and 
the multivariate Cauchy. Distributions that do not share this property 
include the multivariate logistic, multivariate Pearson type II, multivari- 
ate Pearson type VII , and the multivariate Bessel. 


1.6 Density Expansions 
Fisher (1925) and later Dickey (1967a) provided expansions of the pdf 


Detya f, A A 
fey = toh 


10 Introduction 


of the univariate Student’s t distribution. The expansion in the latter 
paper involves Appell’s polynomials, and hence recurrence schemes are 
available for its coefficients. Specifically, 


fee) = Shrew (+t e) Yas (—*E pe) atu 
(1.7) 
where 
k-1 
Ga) = Pelt) ~ a QDPD). (1.8) 
T Zo 


Here, P,(t) are polynomials (in powers of t) satisfying 


-(1+v)/2 
Trdmary® = (1-75) e9 
k 


l+v 


and P(T) denotes the polynomial P(t) with the powers t” replaced 
by [(r + 1/2). Dickey (1967a) also provided an analog of (1.7) for the 
multivariate t pdf (1.1). It takes the same form as (1.7) with z? replaced 
by (x — )7™R7!(x—y), v+1 replaced by v + p, and with (1.8) replaced 
by 


Qt) = P,(t)- TỌ rom Èa )Pk-a( 


where I, indicates the substitution of P(r + p/2) for t”. 


1.7 Moments 


Since Y and S in (1.2) are independent, the conditional distribution of 
(Xj, Xj), given S = s, is bivariate normal with means (4, 4j), common 
variance a? / s”, and correlation coefficient rij- Thus, 

E (Xi) E|E(Xi|S = s)] 
E (ui) 


= Hi- 


II 


To find the second moments, consider the classical identity 


Cov (Xi, X;) = E [Cov (Xi, X; S = 5] 
+Cov [E(Xi|S = $) E(X,|S = s)] 


1.7 Moments 11 


for all 1,7 =1,...,p. Clearly, one has 
1 
E[Cov(X;,X;)|S=s] = ot jE (z) 
and 
Cov [E(Xi|S = s) E(X,;|S=s)] = 0. 


If v > 2, then E(1/S7) exists and is equal to v/{o?(v — 2)}. Thus, by 
choosing i = j and i < j, respectively, one obtains 
v 


Var(X;) = Z3 


and 


V 


PE ye 
Te Ni 


Cov (Xi, Xj) 


Hence the matrix R is indeed the correlation matrix as stated in defini- 
tion (1.1). 

In the case where yz = 0, the product moments of X are easily found 
by exploiting the independence of Y and S in (1.2). One obtains 


E ese 
j=l 


bry ,r2,..4rp 


z |a= (TJY 
j=1 


p 
= o WME) TT YS?) Bla"), 
j=l 


provided that r = rı + r2 +---+7p < v/2. In the special case where 


Yi, ..-, Yp are mutually independent, one obtains 
p 
Hry,r2,..4tp = o`y" PE [x77] II E iv] : 
j=l 


If anyone of the r;’s is odd, then the moment is zero. If all of them are 
even, then 


_ vea {1-3-5---(2ry — 1} 


12 Introduction 


In particular, 


v 
E2,0,..,0 = Jy v>2, 

3y? v>4 

4,0, At) (v — 2)(v — 4)’ ? 

ue >4 

H2,2,0,...,0 (v —2)(v —4)’ Vv ) 

and 
y3 
= ——____., >6 
}42,2,2,0,...,0 W x Dv -T Dv — 6) V 
1.8 Maximums 

Of special interest are the moments of Z = max(Xı,..., Xp) when 


XT = (X,...,Xp) has the t pdf (1.1) with the mean vector p and 
covariance matrix ©. These moments have applications in decision the- 
ory, particularly in the selection and estimation of the maximum of a 
set of parameters. It also has applications in forecasting. The problem 
of finding the moments of Z has been considered by Raiffa and Schlaifer 
(1961), Afonja (1972), and Cain (1996). 

Raiffa and Schlaifer (1961) provided an expression for E(Z — 6) for 
the case where p = 3 and p = 61, (where 1, denotes a vector of 1’s). 
Afonja (1972) generalized this for the general case of unequal means, 
variances, and correlations. We mention later a particular case of this 
result for p = 61,. Let p(y; R) denote a p-dimensional normal pdf 
with zero means, unit variances, and correlation matrix R. Also let R; 
denote a p x p matrix with its (j,7')th element equal to r;,;;,, where 
ri jj (j, J! # 4) is the correlation between (X;, —X;) and (X;, —Xj-) and 
Tig = corr(X;, Xi — X;). Then the kth moment of Z is given by 


E) = roy >> (5) i (se) 0 (42) Hs (ye) 


i=1 j=0 
(1.9) 


where 


Pf fin fF devon 


dypdyp—1 +++ dyi: + - dy2dyı (1.10) 


1.8 Mazimums 13 


is the marginal moment (up to a constant) of truncated normal variates. 
The mean and variance can be derived easily from this formula. For 


example, 
y-1 V 
E(Z) = 6+{E(W) -ar ( ) /v (=) 
where W = max(Yj,..., Yp) for a p-variate normal random vector YT = 
(Y1,--.,¥p) with means equal to @ and covariance matrix (v/(v —2))D. 


Afonja (1972) showed further that 


Pp 
EW) = 0+\/-—5 >> vam (ui), 


where ji (y;) is given by (1.10) for 7 = 1. 

More recently, Cain (1996) considered two forecasts F} and F, of a 
future variable Y where the forecast errors X, = Fi —Y and X2 = Fh,-Y 
are assumed to have the bivariate t distribution with means (4, 42), 
variances (o7,03), correlation coefficient p, and degrees of freedom v > 
2. Cain was interested in the maximum Z = max(X,, X2) of the two 
forecast errors and whether this nonlinear function could be useful as a 
component of a linear combination forecast. It was shown that the pdf 
of Z can be written as the sum 


f(z) = filz)+ fa(z), 


1 v [ v z-jū 
z = —,/—t tet ace 
fil2) ajVu-—2 ( vy—2 Gi ) 


teejay 


a Pile aad (z) 


for k = 3—j, j = 1,2. Here, t, and T, are, respectively, the pdf and the 
cdf of the Student’s ¢ distribution with degrees of freedom v. Integration 
by parts yields that 


E(Z) = mf fi (e)dz + m f falz)dz + Ttv- , (H =H), 


where 


xTi+v 


Var(Z) = o? a filz)dz + o2 T fa(z)dz 


14 Introduction 


+m- f hode f hod 


+7 (m — He) tv-2 (=£) T fo(z)dz 


-r (1 =n) tran (E) f" p(eyae 


(u — m) (03 — o7) Hı — pe 
v2) toa (BO = 


Hı — H2 
-r°t 5 (“e 7 ) , 
and 


Cov(Z,Xı) = oF |  fileide+ pron f Hoa 
(m — H2) (of — poio) | ; (£ - r) 
) v- T , 


$ T(v -2 


where T = yo? + o2 — 2p0i02. The two integrals in the above expres- 
sions can be evaluated as 


es z Hı — H2 
f Hoa = To (e m — 
[hoi = 1-7, (88 
a T a 


The expression for Cou(Z, X2) can be obtained by switching the sub- 
scripts 1 and 2. As v — oo, the above expressions can be reduced by 
replacing t,(-) and T,(-) by ¢(-) and ®(-), respectively. On the other 
extreme, as vy — 2+, the expressions could be reduced by using the fact 


that 
fal = fO ifs=0, 
a a a 1/2, ifs #0, 


7 1, if z > 0, 

lim, Ty 2 (y z7) = 1/2, ifa=O0, 

y—-2 Y 0, ifr <0. 
This suggests that the results for the maximum of bivariate ¢ distributed 
errors may be materially different from those for bivariate normal errors. 


and 


and 


1.9 Distribution of a Linear Function 15 


Cain (1996) also investigated to see whether the maximum Z can 
provide information additional to that of F, and F, in forecasting Y 
via a linear combination of the form F = a+ 6, F, + BoF2 + yM with 
Bı + B2 +y = 1. Cain showed that the mean squared error of F is 
minimized when y = 0 and hence that M is linearly dominated by F) 
and F2. Similar calculations reveal that the mean forecast (PF) + F))/2 
dominates M if and only if either u = u2 or 0) = og. Evidently further 
investigations are in order (to consider, for example, the case of more 
than two forecasts). 


1.9 Distribution of a Linear Function 


If X has the p-variate ¢ distribution with degrees of freedom v, mean vec- 
tor yz, and correlation matrix R, then, for any nonsingular scalar matrix 
C and for any a, CX +a has the p-variate ¢ distribution with degrees of 
freedom v, mean vector Cp +a, and correlation matrix CRC’. This re- 
sult is of importance in applications and is similar to the corresponding 
result for the multivariate normal distribution. 


1.10 Marginal Distributions 


Let X possess the p-variate t distribution with degrees of freedom v, 
mean vector p, and correlation matrix R. Consider the partitions 


Xx = ( x | (1.11) 
Hi 
= 1.12 
g ( H2 ) ( 
and 
Rn Riz ) 
R = 1 1.13 
( Rai Re ee 


where X; is pı x 1 and Ry; is pı x py. Then X, has the p;-variate 
t distribution with degrees of freedom v, mean vector p, correlation 
matrix R1, and with the joint pdf given by 


a e i 2) 
fœ) = (rv)P1/2T (v/2) Rat 
re —(v+p1)/2 
x Jit 7 (x =) Ry Ga - #4) 


16 Introduction 


Moreover, X, also has the (p — pı )-variate ¢ distribution with degrees of 
freedom v, mean vector fy, correlation matrix R22, and with the joint 
pdf given by 

fm) = teen? 

(mv)P1/2T (v/2) [R22] 

1 To —(v+p—pi)/2 
x jl+ a (x2 — My) Rz (X2 — H2) 


1.11 Conditional Distributions 


Several interesting properties have been obtained for conditional pdfs of 
the multivariate ¢ distribution. If X has the central p-variate ¢ distribu- 
tion with degrees of freedom v and correlation matrix R, it then follows 
from Section 1.10 that the conditional pdf of X> given Xj is given by 


fle |m) = —Pe+e/2) (Rul? 
(wr) PT + p)/2) R 


L+ Umf Rg 


[1+ (1/v)xT Rx] "+? B 
Since 
IR| = [Ru] |R22 - Rə R3} Ra2| 
and 
xR x = xi Rī; xı +3.) R32.1X21, 
where 
X21 = X% — Ra R} x1 
and 
Ro. = R -RaR Ri, 
one can rewrite (1.14) as 
f(xz|x) = mn ae 172 
{v + pir} T ((v + pi)/2) [R221] 
x h + 1 & Eeg Rapea) 
v+p 14+(1/v)xP R xX 


w+p)/v (p—p1)/2 
ý f + e l a) 


1.11 Conditional Distributions 17 


Landenna and Ferrari (1988) noted that this conditional pdf is not a 
(p — pı )-variate ¢ unless the values of x; are +1. For example, consider 
the special case of (1.15) for R. = Ip. In this case, (1.15) becomes 


T ((v + p)/2) 
f(x.|x) = / (p—p1)/2 
n(p—P1)/2T ((v + pı)/2) (v + ja 55 ) 
5 —(v+p)/2 
1 
x |1 + —S z3 
v+ Dra x} 2: ‘ 
(1.16) 
When Tj = +1, j = 1,2,:..,P1, (1.16) reduces to 
T ((v + p)/2) 
f (xe | xı) z / (p—pi)/2 
(p-P1)/2T ((v + py)/2) (v + pr) 
—(v+p)/2 


which is the joint pdf of a central (p — p,)-variate ¢ distribution with 
degrees of freedom (v + p;) and correlation matrix I,_»,. Landenna and 
Ferrari (1988) also described the manner in which the probabilities of 
the conditional pdf (1.15) can be expressed in terms of the probabilities 
of x2 conditioned on x; taking the values +1. 

The form of the conditional pdf (1.15) also suggests that 


and 
= 1 7 —1/2 E 
to = VEE (r bapa) onani 
(1.18) 


are independent, that Y, has the central p,-variate t distribution with 
degrees of freedom v and correlation matrix R,,, and that Y> has the 
central (p — pı )-variate t distribution with degrees of freedom v + pı and 
correlation matrix Rəs.. From this observation, it follows easily that 
the conditional expectation of Xə given X; is linear and that E(X2 | 
X,=x,)= Ra Rj’ x. In particular, 


1 3 

— — — — * $ 

E (Xp |X: = T1; .--, Xp-1 = Lp-1) = r Tipi 
PP j=0 


18 Introduction 


and 


1 v 15 Tip" k 
= ——— |l+ż- >D fre- ort | ajap , (1.19) 
Trp Utp-3 eras r 


where r5, is the (j, k)th element of R-! (Bennett, 1961). It is illumi- 
nating to compare the conditional variance (1.19) with the value 1/r},, 
corresponding to the conditional variance of the multivariate normal 
distribution. 

Siotani (1976) generalized the result of (1.17)-(1.18) by splitting X 
into more than two sets of variates. Let 


Xı 
Xo 
oan ee (1.20) 
Xk 
and 
Rn Riz © Rik 
Ro Ro = Ræ 
R = x t ‘ ) 
Re Reo © Rex 


where X; is pı x 1 for! = 1,2,...,k and Rim is pı X pm for l = 1,2,...,k, 
m=1,2,...,k. Clearly pı +po+---+p, = p. Introducing the notations 


qı = pı +p2 +++: + Pi, (1.21) 
Xı 
X2 
Xi = : 3 (1.22) 
X; 
Rit Rio Ry 
Ro Ree =- Ra 
Rw =F : A T : , (1.23) 


1.12 Quadratic Forms 19 


Ri i4i 
RGD = ici l (1.24) 
Riis 
and 
Rio = Reiss - RHD" RoR) (1.25) 


Siotani showed that 


and 


-1/2 
jv +a lora- 
Yi4i = F (1 + 5X; Xw) 
Xin ROO ROX 
x | Au) (1) a AN 


for l = 1,...,k — 1 are independent, that Y, has the central p)-variate 
t distribution with degrees of freedom v and correlation matrix Ry, 
and that Yj41 has the central pı+ı-variate t distribution with degrees of 
freedom (v + qı) and correlation matrix Ripı +1- for l= 1,...,k — 1. 
In the special case for R = Iņ, the Y’s can be written as 


Yı = X: 
and 
ini —1/2 
Ju + 
Yı = <a (1 + 7 a XE%n] Xii. 
m=i 


1.12 Quadratic Forms 


If X has the p-variate t distribution with degrees of freedom v, mean vec- 
tor p, and correlation matrix R, then X7R~!X/p has the noncentral F 
distribution with degrees of freedom p and v and noncentrality parame- 
ter TR ~!u/p. See Hsu (1990) for a particular case of this result. When 
u = 0, the distribution is central F and so X7R7!X/(p + X7R7'!X) 
has the Beta(p/2,v/2) distribution. There are a number of problems 
related to quadratic forms of multivariate t that are worthy of further 
investigation. 


20 Introduction 


1.13 F Matrix 


Consider two independent random samples x), : x) and x?) zy x2) 


from two different. elliptical distributions (which contain aara t 
as a particular case — as already mentioned in Section 1.1). Let 


= Sd OT 


for i = 1,2. Then F = (S1 /n1)/(S2/n2) is the multivariate F matrix. 
Hayakawa (1989) studied the asymptotic behavior of the determinant, 
latent roots, latent vectors, and the trace of the F matrix for an elliptical 
population. These results are useful in the study of the robustness of 
the statistics derived for testing several hypotheses about parameters of 
a normal population with the elliptical distribution introduced as the 
alternative population. Hayakawa (1989) illustrated the usefulness of 
the results through a multivariate t-population. 


1.14 Association 


The well known definition states that the random variables X4, ..., Xp 
are said to be associated if 


Cou(f(X1,---,Xp),9(X1,---,Xp)) 2 0 


for all nondecreasing functions f, g (Esary et al., 1967). Association im- 
plies positive quadrant dependence, that is, that Pr{N(X; < z:)} > 

pay Pr(Xi < 2;) for all real numbers z1,...,£p (Lehmann, 1966). 
Jogdeo (1977) and Abdel-Hameed and Sampson (1978) established that 
the components of a multivariate ¢ random vector are associated un- 
der certain conditions on correlations. More generally, the following 
result holds. Let Z be a p-variate vector with independent and real 
components, each having a symmetric unimodal distribution. Suppose 


Y = Z + U, where U is independent of Z and either 


(i) U = (œ@V,...,@kV,Ok+41W,.--;,@nW), where (V,W) has a bi- 
variate normal distribution centered at 0, 


(ii) or U = aW, where æ is an arbitrary but fixed p-variate vector 
and W is an arbitrary real random variable. 


1.15 Entropy 21 


For (n + 1) independent and identically distributed (iid) copies YF = 
(Yu,---, Yip), i=0,1,...,n of Y define X?, j =1,...,p by 


Gide 
es Yj 
Then the variables X? (or, equivalently, | X; |), j = 1,.-.,p are associ- 
ated. 

Now, redefine Y as a p-variate normal random vector with zero means 
and covariance matrix specified by X = {rjjo;0;}. Let S? and Sz? be 
independent chi-squared random variables with degrees of freedom n 
and qx, respectively, for k = 1,...,p. Also assume that X, $2, and Sj? 
are mutually independent. Then, as a consequence of the above general 
result, one could provide the following assertions about bivariate and 
trivariate t vectors 


e For p = 2, the random variables 


{nI Irl 


yS? + +s Se + S3? 
are associated. 


e For p = 3, if [],.; sign(Aij) < 0, where A = {Aij} = =~’, then the 
random variables 


(X1, X2) = 


oln] _ I|) _ Inl 


\/ S? +S S2 + S3? de +53? 


(X1, X2, X3) = 


are associated. 


1.15 Entropy 


The entropy of a continuous random vector X may be regarded as a 
descriptive quantity, just as the median, mode, variance, and the co- 
efficient of skewness may be regarded as descriptive parameters. The 
entropy is a measure of the extent to which a multivariate distribution 
is concentrated on a few points or dispersed over many points. Thus, the 
entropy is a measure of dispersion, somewhat like the standard deviation 
in the univariate case. 
Mathematically, the entropy of X is defined by 


H(X) = E[-logf(X)] 


22 Introduction 


= - ft% log f (x) dx. (1.26) 


Guerrero-Cusumano (1996a) derived the forms of this for the multivari- 
ate t distribution. For a central p-variate t, it turns out that 


H(X;R) = FlogiR| + 10g [Pw (2,2) 


v+p y+p v 
+ e( : ) ¥(3)| , (1.27) 
where y(t) = dlogI'(t)/dé denotes the digamma function. Note that 
(1.27) can reexpressed as H(X) = 1/2 | R | +8(v, p), where ®(v,p) is a 
constant that depends only on v and p. Table 1 in Guerrero-Cusumano 


(1996a) tabulates ®(v,p) for v = 1(1)35 and p = 1(1)5. The following 
is an abridged version of the table. 


Constant ® for H(X) = 1/2| R | +&(v, p) 


= 


p=1 p=2 p=3 p=4 p=5 


2.53102 4.83788 7.06205 9.24381 11.3999 
1.96028 3.83788 5.67306 7.48261 9.27502 
1.77348 3.50454 5.20997 6.89826 8.57432 
1.68176 3.33788 4.97687 6.60362 8.22121. 
1.62750 3.23788 4.83602 6.42500 8.00685 
1.59172 3.17121 4.74153 6.30474 7.86226 
1.56638 3.12359 4.67368 6.21809 7.75785 
1.54750 3.08788 4.62257 6.15261 7.67878 
1.53289 3.06010 4.58266 6.10135 7.61677 
1.52126 3.03788 4.55062 6.06010 7.56678 


OON OTR WN PR 


= 
jon) 


The particular case of (1.27) for v = 1 gives the entropy for the multi- 
variate Cauchy distribution 


1 qP/2 p 1 


ppe O] 


As v —> œ, (1.27) converges to the entropy of the normal distribution 


1.16 Kullback-Leibler Number 23 
given by 
H(X;R) = 5 log(2er) + 5 log IRI. (1.28) 
The sampling properties of (1.27) will be discussed in Chapter 9. 
For the noncentral p-variate t, (1.26) takes the general form 
(vr)?! 


H(X;R) = Flog iR| + 10g |B (2,2) + 


“=? (47,4), 
(1.29) 


where A = wp’ Ro! and M(v,p, A) is given by 


wien) = (E)E E v} 


j=0 
Setting v = 1 in (1.29), one can obtain the entropy of the noncentral p- 
variate Cauchy distribution. In the case p = 1, (1.29) coincides with the 
entropies for the univariate Student’s t and Cauchy distributions given, 
for example, in Lazo and Rathie (1978). 

Zografos (1999) provided a maximum entropy characterization of (1.1). 
The maximum entropy principle suggests to approximate the unknown 
pdf of X by the model that maximizes (1.26) subject to the constraints 
that define the class of pdfs considered. Jaynes (1957) asserted that the 
maximum entropy distribution, obtained by this constrained maximiza- 
tion problem, “is the only unbiased assignment we can make; to use any 
other would amount to an arbitrary assumption of information which 
by hypothesis we do not have.” Zografos (1999) showed that (1.1) is the 
solution to maximizing E[— log f(X)] subject to the constraint 


B [tog {1+ 3- R(X - m} = w (2548), 


where w(z; a) = y(x) — y(x — a), x > a, and 7(-) denotes the digamma 
function. For further discussion of maximum entropy methods, see Fry 
(2002). 


1.16 Kullback-Leibler Number 


The mutual information of a continuous random vector X with joint pdf 
f(x) and marginal pdfs f(z;),i=1,...,p is defined by 


ro = [eff sy 


24 Introduction 


with the domain of variation given by 0 < T(X) < oo. (The reader 
should not confuse this with the transformation T(X) given in (1.5).) 
The quantity (1.30) can be considered a measure of dependence (Joe, 
1989). The larger the T(X), the higher the dependence among the 
variables X;,i = 1,...,p. Naturally, T(X) = 0 implies that the variables 
are independent; this latter statement. follows from the fact that T is a 
special case of the Kullback-Leibler number, K L(f,g) (Kullback, 1968). 
When the variables of X are multivariate normal with covariance matrix 
E, it is easy to compute T(X) as the difference between entropies given 
by (1.28); specifically, 


T(X;2) = H(X;%)-H(X;D), 


where D is a diagonal matrix corresponding to & with the elements 
011,--.;%pp- This is due to the well known fact that uncorrelatedness 
implies independence in the normal case. This fact also implies that 
T(X;1) = 0. In general, for any member of an elliptical family of dis- 
tributions, this is not true; in other words, uncorrelatedness does not 
imply that T(X) = 0. The mutual information attempts to summarize 
in a single number the whole dependence structure of the multivariate 
distribution of X. 

Guerrero-Cusumano (1996b) derived the form of (1.30) for the multi- 
variate t distribution. For a central p-variate t, it turns out that 


1 
T(X) = 2 — z log|RI, (1.31) 


where Q is given by 


ee oe) ee 


2 
ptv ptv v 
— —pl(—}>. 1.32 
r {Y (Ps) -¥G)] (1.82) 
It is easy to see that 2 > 0 as v oo. The mutual information for the 
multivariate normal distribution with correlation matrix R is given by 
—(1/2) log | R. | (Kullback, 1968). The particular case of (1.31) for v = 1 


gives the mutual information for the multivariate Cauchy distribution 
with Q taking the simpler form 


0 = maf TED) 2 (22) C) 


1.16 Kullback-Leibler Number 25 


Table 1 in Guerrero-Cusumano (1996b) provides values of (1.32) for a 
range of v and p. The following is an abridged version. 


Constant Q for T(X) =Q — (1/2) log| R | 


p=2 p=3 p=4 p=5 


0.4196180 0.949615 1.530690 2.141170 
0.2927000 0.705474 1.184010 1.704100 
0.2254360 0.565424 0.975130 1.431820 
0.1835450 0.473177 0.832265 1.240460 
0.1548760 0.407380 0.727338 1.096790 
0.1339950 0.357917 0.646600 0.984235 
0.1180970 0.319304 0.582368 0.893344 
0.1055830 0.288289 0.529959 0.818244 
0.0954730 0.262813 0.486337 0.755056 
0.0871342 0.241503 0.449434 0.701101 


© 

3 
lI 
= 


CON aoa hh ONE 
ooooonocoo.o°o & 


= 
fon) 


Figures 1.5 and 1.6 graph T(X) in (1.31) for p = 2 and p = 4, respec- 
tively. The correlation matrix R is taken to have the equicorrelation 
structure Ti; = p, i £ j. It is interesting to see the “dale-shaped” three- 
dimensional plot. The figures show that, as one moves toward the center 
of the “dale,” the dependence among the variables decreases, and, as one 
moves away from the center, the dependence increases. 

For the normal case, Linfoot (1957) and Joe (1989) suggested a param- 
eterization for T(X) to make it comparable to a correlation coefficient. 
They defined the induced correlation coefficient based on the mutual 
information as 


pr = Vl-—exp{-2T (X)}. (1.33) 


Guerrero-Cusumano (1998) suggested a similar measure for the multi- 
variate ¢ distribution referred to as the dependence coefficient. It is given 
by 


pr = V1-—|R|exp(—29). (1.34) 


The dependence coefficient is a quantification of dependence among the 
p variables of X. This follows from the fact that independence implies 


26 Introduction 


Fig. 1.5. Mutual information, (1.31), for p = 2 


pr = 0 and that T(X) = œ implies py = 1. When v > ov, (1.34) 
coincides with (1.33). 
The sampling properties of (1.31) will be discussed in Chapter 9. 


1.17 Rényi Information 


Since the concept of Rényi information is not widely available in the 
literature, we provide here a brief discussion of the concept. Rényi in- 
formation of order for a continuous random variable with pdf f is 
defined as 


Tr(A) := — ig (J Pear) (1.35) 


1.17 Rényi Information 27 


Fig. 1.6. Mutual information, (1.31), for p = 4 


for A Æ 1. Its value for À = 1 is taken as the limit 
Tri) := lim Zp(A) 
Awl 


TEE f f(a) log (f(a) dz 
= -Ellog f(X)], 


which is the well known Shannon entropy. Rényi’s (1959, 1960, 1961) 
generalization of the Shannon entropy allows for “different averaging of 
probabilities” via A. Sometimes (1.35) is also referred to as the spec- 
trum of Rényi information. Rényi information finds its applications as 
a measure of complexity in areas of physics, information theory, and 
engineering to describe many nonlinear dynamical or chaotic systems 
(Kurths et al., 1995) and in statistics as certain appropriately scaled 


28 Introduction 


test statistics (Rényi distances or relative Rényi information) for testing 
hypotheses in parametric models (Morales et al., 1997). The gradient 
TR(à) = dZR(A)/dX also conveys useful information. In fact, a direct 
calculation based on (1.35) — assuming that the integral f fò(z)dz is 
well defined and differentiation operations are legitimate ~ shows that 


A( 
Th(1) lim ja-» y LE os fei + log (/ ev) | 


Jo-» 


1 im J LEE) log” ie - (ite “og fla) a 
~ 2X51 fS FNE) f f(x) 


-var log f(X)]. 


lI 


II 


In other words, the gradient of Rényi information at À = 1 is sim- 
ply the negative half of the variance of the log-likelihood compared 
to the entropy as the negative of the expected log-likelihood. Thus, 
the variance of the log-likelihood Z := 2Z'p(1) measures the intrinsic 
shape of the distribution. This can be seen by observing that Zs, where 
f(z) = (1/e)9((z — »)/o). In fact, according to Bickel and Lehmann 
(1975), it can serve as a measure of the shape of a distribution. In the 
case where f(x) has a finite fourth moment, it plays a similar role as 
a kurtosis measure in comparing the shapes of various frequently used 
densities and measuring the heaviness of tails, but it measures more than 
what kurtosis measures. 

Rényi information of order À for a p-variate random vector with joint 
pdf x is defined as 


TrR(A) := oe ( f P (1.2) dea dry) (1.36) 


The gradient Z,(A) and the measure Zy are defined similarly. 

Song (2001) provided a comprehensive account of Zp(A), Z,(A), and 
T; for well known univariate and multivariate distributions. For the 
univariate Student’s ¢ distribution with degrees of freedom v, it can be 
shown for A > 1/(1 + v) that 


vs all B((vh+—1)/2,1/2)) 1 
ZR) = ye AO Oh + gist), 


1.17 Rényi Information 29 


aero B((vA +A —1)/2,1/2) 
ARA jef BORA | 


Nie) /và+À-1 
tea Y (==) 


-Eaten (A)| Jay, 


m = SEO) 


Using tables in Abramowitz and Stegun (1965), one obtains the partic- 
ular values 


and 


(1) = Ti 

E;(2) = e 
Z;(3) = Tan 
T;(4) = eee 
T;(5) = an? — =. 


It is interesting to note that the measure Zs(v) decreases as v increases, 
which makes sense since the tails become lighter as v increases. In fact, 
it can be shown, using asymptotic formulas for the trigamma function, 
that lim,o.Z;(v) = 1/2, which corresponds to the measure Zp(v) for 
the normal distribution. 

For the central p-variate ¢ distribution with correlation matrix R and 
degrees of freedom vy, it can be shown for À > p/(p + v) that 


mO) = ojlog{ SO Pe RPE 4 Stogo) IRD 
-vsr (2) 
; B((v\ + pd— 1)/2,P/2) 
Tr) = ies {= B(v/2,p/2) 


(1-A)(p+v) (vA+prA-p 
Coal, (aspar) 


30 Introduction 


-Ep1 (24) | a-a, 


mo = E E) 


For p = 1, these expressions reduce to those derived for the Student’s t 
distribution. 


and 


1.18 Identities 


In one of the earliest papers on the subject, Dickey (1965, 1968) pro- 
vided two multidimensional-integral identities involving the multivariate 
t distribution. This first identity expresses a moment of a product of 
multivariate t densities of the form (1.1) as an integral of dimension 1 
less than the number of factors. Consider the product 


K 


g(x) = [I [1+ (= m)" Re (= m) 
k=1 


ae (1.37) 


where each R; > 0 and 4% > 0, and so each term may not have a finite 
integral. The identity seeks an expression for the complete p-dimensional 
integral of s - g, where s(x) is a polynomial in the coordinates of x. Let 
Y be a p-variate normal random vector with the covariance matrix and 
mean vector given by 


R =1 
D;' = (>: uma) 
k=1 


and 
K 
Bu = Dy Y Rehr: 
k=1 
respectively. For given constants ck > 0, k = 1,...,K, let u. = 


DA Cruz and up = vpu.. Then the quantity defined by Nsju 
E(s(Y)) can be expanded as a polynomial in 1/u. as 


Nolu = yA; (v1, . ..,UK)UD?. 
j 


1.18 Identities 31 


Given this terminology, the identity can now be expressed as 


f sax 
= Ko Dor (4-3) [D l? hy (vr... 10K) 


K 
x (i op) Wi-(-P)/2 dy, duK—1, (1.38) 
where 


K 
Ko = m’ J[ E 0/2, 
k=1 


K 
w= J Vk, 
k=1 


K 
D, = > wR, 
k=1 


k=1 


K K T K 
W, = Door {1 + LE ReH} = (>: nRa) D7’ c nRa) 
k=1 k=1 


and ø is the simplex 


K 
oc = fon: Jansa nol. 
k=l 


This identity has applications to inference concerning the location pa- 
rameters of a multivariate normal distribution. In the particular case 
K = 2, Ry = ypIp, and s = 1, (1.38) reduces to 


= (v.-p)/2p (Y1 V2 
f 00a = ore (4,22) 


where 


DE «a 
r (vı /2)T (v2/2) yy eager 


32 Introduction 


F, is Appell’s hypergeometric function of two variables defined by 
Fy (a; 8, B's 932,y) 


= ae i a—l/1 _ y-a- /1 _ p -ba -p' 
= oeer it (1-t) (1 — tr) f (1 — tyy ® dt 
(1.40) 


(see, for example, Erdélyi et al., 1953), and 21 and 22 are the two real 
roots of the equation 


Y 
2+ (n A P+ -1)2—m iene EE 


The integral (1.39) is proportional to a multivariate generalization of the 
Behren-Fisher density. For an asymptotic expansion of (1.37) in powers 
of vk, see Dickey (1967a). 

The second identity given by Dickey (1968) — see also Dickey (1966b) 
— expresses the density of a linear combination of independently dis- 
tributed multivariate t vectors as an integral of dimension 1 less than 
the number of summands. Consider the r-variate vector 6 formed by 
the linear combination 


K 
>> BeXs, 
k=1 


where X, are independent q,-variate standard ¢ random vectors with 
zero means, covariance matrix I,,, and degrees of freedom vg. Dickey 
(1968) showed that 6 has the representation 


K 
6 = |> \%U;'BBTY, 
k=1 


where Ug are independent chi-squared random variables with degrees of 
freedom vk and Y is an independent r-variate standard normal vector. 
As a consequence, 6 has the further representation 


vV (vy, /v.) BB? W, 


k=1 


where v. = pees Vk, Ve = tel Dia U; and W is an independent 
r-variate standard ¢ vector with degrees of freedom v.. If the matrix 
>, B4B? is nonsingular, the distribution of 6 is nondegenerate with the 


1.19 Some Special Cases 33 


joint pdf 
K K -1 —(v.+r)/2 
f(6) = C (Ir) tar (Sor BBE) 5 
o \k=l k=1 
K 
/ XO (ve/ve) BBT dv; «++ dui, (1.41) 
k=1 


where 
C((v. +7) /2) 


as nT (14 /2)---T (ve /2) 


and as above 


K 
c= {renstK) Seam = no}, 


k=1 


This identity has applications in Behrens-Fisher problems. The version 
of (1.41) for K = 2 and By = fy is 


7 vi +p n+p 
O e a+?) 


v tp v.+pv.+p v. 
x Fy (=e, 9 Gt piane), 


where 
T ((v. + p) /2) 2\ 41/2 2\—(vitp)/2 
TPIT (11/2): -P (v2/2) (vbi) (v262) ’ 
F; is Appell’s hypergeometric function as defined in (1.40), and z; and 
z2 are the two real roots of the equation 


+ (LE oft 1), Lee 
v2 33 V2 83 vp} 
This special case is essentially equivalent to the two-factor version of 
(1.38). Moreover, (1.41) is a generalization of Ruben’s (1960) integral 
representation (in the univariate case) for the usual Behrens-Fisher den- 
sities. 


1.19 Some Special Cases 


A number of special cases of (1.1) have been studied in the literature 
with great detail. Cornish (1954), in his early paper, considered the 


34 Introduction 


special case of (1.1) when p = 0 and R is given by the equicorrelation 
matrix 


1 -l/p ... —l/p 
Ro [7 1 e ne 
a | 


The following interesting properties were established 


e XTR-!X has the noncentral F distribution with degrees of freedom 
pand v. 

e X7?R~-'X has the Fisher’s z distribution with degrees of freedom p- q 
and v — when X is subject to the linearly independent homogeneous 
conditions represented by the equation SX = 0, where S is of order 
q x p and rank q < p. 

e The cdf of the quadratic form Q = XTAX when A is of rank q < p 
is given by 


eaS S E) T a 


where xT = (z1,..., £q) and the domain of integration is defined by 


q 
D riz? > Q, 
i=1 


where A; are the roots of the equation | ART! — A |= 0 or, alterna- 
tively, the latent roots of the matrix RA. Consequently, the distri- 
bution of XTAX is Fisher’s z with degrees of freedom q and v if and 
only if the nonzero latent roots of RA are all equal to unity. 

If the distribution of X is partitioned as in (1.11)-(1.13), then 


E(X:|X2) = -RI Rex, 
and 
Var (X;|X2) = v +x (Raz — Ra Ri Ruz) x p1. 
v+p-p -2 
In the particular case pı = 1, 
12 
E(X|X2) = -59 %5 


j=2 


1.19 Some Special Cases 35 


(p+ 1)y : 
VOR a) > op 8)" de pe TE 
p+1 
ETETE Fp I yt 
E[Var(Xı | X2)] = ie 
Var (X2) = — Ro, 
Cov (X1, Xi) x 


“oa 4=2,...,p. 

Furthermore, the residual variance of X, with respect to X3 is 
v pt+l 

y—2 “2p 


3 


and the partial correlation coefficient of X, with respect to X, is 
—1/2. 


Patil and Kovner (1968) provided a detailed study of the trivariate t 
density 

T ((n + 3)/2) 
(nm)3/2,,/1 — PT (n/2) 


lgi- 2prı +z} 2 eres 
«(142 Tage 1 8 ; 


f (£1, £2, £3) = 


Among other results, Taylor series expansions — in powers of 1/n — of 
the density and associated probabilities in rectangles were given. 


2 


The Characteristic Function 


The characteristic function (cf) of the univariate Student’s ¢ distribution 
for odd degrees of freedom was derived by Fisher and Healy (1956). 
Ifram (1970) gave a general result for all degrees of freedom, but Pestana 
(1977) pointed out that this result is not quite correct. More recent 
derivations are presented in Drier and Kotz (2002). Here we discuss two 
independent results on the characteristic function for the multivariate 
t distribution. The first one, due to Sutradhar (1986, 1988a), provides 
a series representation for the cf while the other, due to Joarder and 
Ali (1996), derives an expression in terms of the MacDonald function. 
The expressions given are rather complicated; thus, further research and 
possible simplifications may be desirable. 


2.1 Sutradhar’s Approach 
Let X be distributed according to 


T ((v + p)/2) 
(nv)P/20 (v/2) [RI 


1 To —(v+p)/2 
f [tS -m R a) 
(2.1) 


Consider the transformation Y = R~!/?(X — u). It then follows that 
the joint pdf of Y is 


—(v+p)/2 
fy(y) = yn (+5) : 


The cf of Y is 
dy (t;v) 


36 


2.1 Sutradhar’s Approach 37 


T ((v + p)/2) ene 
= p T 
= PT UD Joriy ) (++ 08) dy, dyz- - dyp. 


To evaluate this integral, Sutradhar (1986) makes the orthogonal trans- 
formation Y = TZ, where the first column of the p x p matrix T is the 


vector 
ti t2 tp 


with || t ||= vtt. It follows that the cf of Z is given by 


T ((v + p)/2) 
; = pr! 
oz (t; Vv) Tw/2) exp (i || t || z1) dzı 
«| F fi cp +22) FP day. -dzp, (2.2) 
where z, E R, k = 1,...,p and cp = v+? =) z2. Successive integration 
of (2.2) with respect tO Zp, Zp-i,.--,22 yields 
T ((v + 1)/2) 
t; = pe ; 
oz ( sv) v VIT (v/2) ji, (2.3) 
where 
co 
h = f exp (i || t |] w) (w? + v) "t? az. (2.4) 
-0 


Note that Jı is an improper integral along the real axis, where w denotes 
a complex variable. For odd v, the integrand has poles of integer order 
while, for fractional and even v, the poles are of fractional order. Su- 
tradhar (1986) evaluated Jı separately for the three cases: odd v; even 
v; and fractional v, using the relations that 


éx(t;v) = exp (itu) dy (Rt) (2.5) 
and 
dy(tivy) = ġz(t;v) (2.6) 


to obtain the following expressions for ¢x. 
For the case of odd v 


VTT ((v + 1)/2) exp (it? - VAT Rt) 


ox(tiv) = 2-17 (v/2) 
Im—-k—-1 (2v vtTRt) 
( m—k ) k- 


38 The Characteristic Function 
where m = (v + 1)/2. For the case of even v 


(=P ((v + 1)/2) exp (it p) 
VaT (v/2) Tea (m — k + 1/2) 


Sa (SE AmS Tl @-» 
j=0 \k=0, kj 
vtTRt T'(n+1) 
-I F ) (10g 4 -Fe h. ae 


where m = v/2. Finally, when v is of fractional order 


ọx(t;v) = 


m(—1)™v/2)-™P ((v + 1)/2) exp (it? u) 
XT (v/2) sin(Em)E (€ + 1/2) k- (444 — k) 


2n 
< |1 /vVvtTRt 2 Ty (n -— E- k) 
«> [3 / 2 l SER 


_ (eT Rt)§ Ilgo (n-k) Y 


ọx(t;v) = 


XT (n+1+€) (2.8) 


where m = [(v + 1)/2] is the integer part of (v +1)/2 and € = (v/2) — 
is such that 0 <| € |< 1/2. 

Both (2.7) and (2.8) involve infinite series. Checking the convergence 
of these series is an open problem. For another series representation for 
the cf of the multivariate t, see Javier and Srivastava (1988). 


2.2 Joarder and Ali’s Approach 


An integral representation of the MacDonald function is 


Ka(r) = @) Sef (1 +u?) CHP cos(ru)du, (2.9) 


where r > 0 and a > —1/2 (see, for example, Watson, 1958, page 172). 
A series representation of Ka(r) for r > 0 and a > 0 is given by 


K,(r) = 2-7 >» leis iD, -a 


vty) 
(—=1)°27 ea De rey Gra ma pita 


2.2 Joarder and Ali’s Approach 39 


—(- 1)°27 aa yti 1+a+J) pita 


j=0 cae 


log (r 
(—1)°27 (1+a) ew IG pay = oe pita (2.10) 


(see, for example, Lebedev, 1965, page 107, 110). A series representation 
of Ka(r) for r > 0 and nonintegral positive values of a is 


r?i—a r?i+a 


K,(r) = 2% > Waa janes + S saree) 


(see, for example, Spainer and Oldham, 1987, Chapter 51), where (c); = 
c(c+1)---(c+j—1) denotes the ascending factorial. Using (2.9), Joarder 
and Ali (1996) rewrote the integral (2.4) as 


zig 
2 


ee yeon f" {cos (|| t || w) + isin (|| t {| w)} (1%) os 


= 1 


OO 2 
ay wena f cos (|| t || w) (1 + 2) dz 
0 


v/2 
prey e (VEI) 


gi 


Thus, using (2.3), one obtains 
_ vae 
z(t) Te xeiro” (|| vot D w 


Hence, using the relationships (2.5) and (2.6), one arrives at the expres- 
sion for the cf given by Joarder and Ali (1996) 


v v/2 
x(t) = exp (itp) Ken (I vvRt I) - (2.12) 


Joarder and Ali also provided expansions of this cf using the series rep- 
resentations (2.10) and (2.11). For positive and even v, applying (2.10), 
one obtains 


v/2—1 
x(t) = exp (itp) {E C (j) || Wut ||? 


=0 


+) C2(9) Il VVR |+ 
j=0 


40 The Characteristic Function 


— 2 2s (9) || VRE |+ log (|| VR I) \ 
j=0 


where 
(-1} (v/2-j -1)! 
Ci(j) = e a 
~ . Yi) +y(l+v/2+ 9) +log4 
Ol) = ou-a 
and 
Cali) eee 


(v/2—1)!$(v/24+ 9)! 
For positive and odd or fractional v, applying (2.11), the cf (2.12) be- 
comes 


dx(t) = exp (it™ 4) Ð {Di(s) || VRE IP? +D20) || VRE II", 


j=0 
where 
i, oh ani 
iG) = j(1—v/2), 
and 
. 2-"4-JT (—v/2 
pane (a 


r (v/2) 0 +v/2); 


Since the univariate Student’s t, multivariate Cauchy, and Pearson’s 
type VII are all particular cases of (2.1), the corresponding cfs in terms 
of the MacDonald function can be obtained from (2.12). They are as 
follows 


e For the univariate Student’s t distribution with the pdf 
r(i+v/2) (,, 2" 
Jun (v/2) v 


(where z € R and v > 0), the cf in terms of the MacDonald function 
is 


f(z) = 


pls v/2 
ox(t) = Ty Kon (Vv |¢|) 


(compare with Dreier and Kotz, 2002). 


2.3 Lévy Representation 41 


e For the p-variate Cauchy distribution with the joint pdf 


— _P(d+p)/2) Rr eT 70+) 
f(x) = aren wale u)” RO (x- p)| 


(where x € RP), the cf is 


éx(t) = exp (it?) V277 || VRE |"? Kr (II VRE I1) 
= exp (it™y- || VRE II), 
which follows by using the result that Ky/2(r) = /7/(2r) exp(—r) 


(see, for example, Tranter, 1968, page 19). 
e For the p-variate Pearson type VII distribution with the joint pdf 
I'(m) 


1 Tp- =e 
xX) = (x— R! (x— 
$0) = nF [1+ 2 (x WP R(x) 


(where x € RP, m > p/2 and v > 0), the cf becomes 


R m—p/2 
ox(t) = exp (it7p) a heen (I vuRt I) 


2.3 Lévy Representation 


Infinite divisibility of the univariate Student’s ¢ distribution was first 
proved by Grosswald (1976) — see also Kelker (1971) for a partial result. 
Later, Halgreen (1979) established the Lévy representation of its cf. For 
a multivariate t, Takano (1994) provided the first proof of infinite divisi- 
bility and the corresponding Lévy representation. Consider the standard 
case u = 0 and R =I). In this case, after suitable transformation, the 
joint pdf (2.1) can be written in the form 


fx) = PE (ia x yr 
The corresponding cf is 
gi-m 
o(t) = Tom) Hell" Km (|| t I). (2.13) 


Takano (1994) derived the Lévy representation of (2.13) in the form 


= es BERS ee T, eee E 
ee a| [ep {ort ie 


42 The Characteristic Function 


«| f” om(2uw)Lnja (VZ Ix I) dw ay zo 


where 
=. / {Ja (Va) + Ya (v2)}, 
La(z) = (2m)~%2*Ka(z), 
Jo (x) ET EH f i (1 TH? exp (izz) dz, 
and 
Yo(zt) = = = {cos (ar) Ja(z) — J-o(z)}. 


Note that J,(-) and Y,(-) are the Bessel functions of the first and second 
kinds, respectively, of order m. 
Now consider (2.13) itself as a joint pdf 


fm(x) = C| x|” Km (lx ll), (2.14) 
where the normalizing constant C is 
gi-m—p/2 
C — r. 
(27)?/?T (m + p/2) 


Using properties of the MacDonald function K(-), (2.14) can be reduced 
to the simpler forms 


fR) = oF exp (- Il x Il); 


frais) = Cy Sex (- E seer ay HI 


1)! 
ha) = oy ree Px 


sot Fe (tet 
L«klin+k)i\ 2 


x {log (Het) — sul + &) — TARTEL 


2.3 Lévy Representation 


43 


Takano (1994) established further that the joint pdf (2.14) is also in- 


finitely divisible and that its cf 
g(t) = (epee er” 


admits the Lévy representation 


g(t) = exp [cam +») l {exp (itTx )=1} Lpz Loza (WW a 


|| x ||P 


where Lp/2(-) is as defined above. 


oy] 


3 


Linear Combinations, Products, and Ratios 


3.1 Linear Combinations 


The distribution of linear combinations of independent t random vari- 
ables has been studied by numerous authors, among them Fisher (1935), 
Chapman (1950), Fisher and Healy (1956), Ruben (1960), Patil (1965), 
Ghosh (1975), and Walker and Saw (1978). Johnson and Weerahandi 
(1988) tackled the distribution of linear combinations for multivariate t 
random vectors. Their results are included here for completeness and to 
motivate further multivariate extensions. We hope that our readers will 
benefit from studying this material, which contains fruitful ideas and 
also refers to the original papers for further details. 

Chapman (1950) considered the difference D = X; — X2 of two inde- 
pendent ¢ random variables X; with common degree of freedom v. If v 
is odd, then it is known that the characteristic function of Y; = X;//v 
is 


dv(t) = Elexp (it¥;)] 
Virexp(— [tD "SX" (v= /2 +8) oy izt- 
2”-1T(v/2) z ee -1)/2-k)! ` 

(3.1) 


Using this representation, Chapman provided the following general ex- 
pression for the pdf of W = D/vv 


flu) = xf exp{-ilv + Dhed, 


which may be integrated to obtain the pdf of D in a closed form for values 
such as v = 1,3,5, and so on. Chapman tabulated the distribution of D 
for v = 1,3, 5, 7,9, 11. 


44 


3.1 Linear Combinations 45 


Fisher and Healy (1956) considered the mixture D = a,X) + a2X2 
of two independent ¢ random variables X; with degrees of freedom vj 
when a; > 0, j = 1,2. It is obvious that the characteristic function of 
D is the product 


bur (a1 |t |) bv (a2 | #1), 


where ¢,(-) is as defined in (3.1). Since the product is simply a polyno- 
mial in t of degree (vı + v2)/2 — 1, it can be expanded in a finite series 
of terms of $m(-) in which the highest value of m = vı + v — 1. For 
example, in the special case vı = 3 and v2 = 5, one can write 


exp {(sin 9 + cos 6) t} 3 (sin 6t) ds (cos 8t) 


es (ene 


575 — 10V3tan6 + 9V5 tan? 0 ; 
eee [(V3 sing + V5 cos 8) t] 


ay (ages) « [ (v3sing + V5cos8) t] ; 


from this one can easily deduce the pdf of D. 

Ruben (1960) provided results on the distribution of D = X; sinĝ — 
X2cosĝ, when X; are independent t random variables with degrees of 
freedom v; and @ is a fixed angle between 0 and 90 degrees. This statis- 
tic was originally proposed by Fisher (1935) as the basis for testing or 
estimating the difference in means of two unconnected and totally un- 
known normal populations, the “fiducial distribution” of the difference 
between the latter quantity and the corresponding sample mean dif- 
ference, when suitably standardized, being supposed to be that of the 
statistic D. Ruben obtained the pdf of D in the integral form 


1 1 fı d?e? (s r (vı tve+1)/2 
o VM +v2B((vı + v2)/2, 1/2) vi +n 
(¥1/2)—1(4 — ş)2/2)-1 
8 s 
PO Bon A i 


where 


(v + v2)s(1 — s) 
(1 — s) sin? 6 + v28 cos? 80` 


p(s) = 


46 Linear Combinations, Products, and Ratios 


It follows directly from (3.2) that D may be expressed in the form 
X 
#(S)’ 
where X is a Student’s ¢ random variable with degrees of freedom vı + 
v and S is a Beta variable with parameters 1/2 and v2/2, that is, 
a variable with pdf given by the second term under the integral sign 
in (3.2), with the first term under the integral sign representing the 
conditional pdf of X/7(S) for fixed S. 
In the special case v; = v = v and @ = 45 degrees, (3.2) reduces to 


2T(v + 1/2)P((v + 1)/2) lv+lvp d? 
= Aea o a a ak =i eric ae tomers 
f(a) vy MPu/Dw D 2 te 9 at a 
where >F denotes the Gauss hypergeometric function. By using the 
appropriate four of the group of 24 transformations of the hypergeomet- 


ric function 2F, (see, for example, Whittaker and Watson, 1952, page 
284), the above pdf may be expressed in the following three additional 


ways 
_ [ZT +.1/2)P((v +. 1)/2) Py” 
A = YS emne +S) 


l-v lv+2 @ 
alar ah W 
_  J2T(v +1/2)T((v + 1)/2) gN TCH 
a = Yo oari (+E) 


11 2 æ 
x oi (645.5 ) , (3.4) 


2 PFZ 
and 


_ [2T +1/2T(( + 1)/2) g2 \ Utne 
ey = ar (E) 


x oF l+v l-v v+2 d 
eg 2? 2 a Op 


(3.5) 


Note that (3.3) and (3.5) may each be expanded as a terminating series 
(refer to the definition of the Gauss hypergeometric function) when v is 
odd, and also that (3.4) is expressed as the product of the pdf of a t ran- 
dom variable with degrees of freedom 2v and the Gauss hypergeometric 
function. 


3.1 Linear Combinations 47 


In the special case vı = œ and v2 = vy, (3.2) reduces to 


T(v)0((v + 1)/2) fv +1 1 d 
-=R +a a 
V2r sin br (v/2)T (v + 1/2) 2 2 2sin’ 6 
where F; is the confluent hypergeometric function. Using Kummer’s 
first transformation for the confluent hypergeometric function, one can 
obtain the alternative form 
r 1)/2 d? 
o OAE DIONE M 
vV2r sin OL (v/2)L (v + 1/2) 2sin* 0 


x 1A (j+ t in) 
2° 2? Qsin?@) 
Ruben (1960) also provided expressions for the cdf of D, but these were 
infinite series involving incomplete gamma function ratios. For tables of 
percentage points of D, see Sukhatme (1938), Fisher and Yates (1943), 
and Weir (1966). 

Ghosh (1975) provided explicit expressions for the cdf in terms of sim- 
ple hypergeometric functions when D = X,~X2 and X; are independent 
Student’s t random variables with common degree of freedom v < 4. In 
particular, Ghosh obtained the following expressions for Pr(0 < D < d) 


tenn dg 
T 2J’ 


f(a) 


a {(1+q)B(q) ~ (1- )K(@)}, 
2/3d(18+d?) 1 d 
rape ee (aa) l 


and 


1 
——— ¢ (8p — 31p? + 48p? + 5p + 2) E 
ag è p’ + 48p° + 5p + 2) E(p) 
-t-10 + 98 + +2) 40}, 


for v = 1, v = 2, v = 3, and v = 4, respectively, where p = d*/(16 +d’), 
q =d°/(8+@), 


n/2 
E(x) = [ V1- zsin? sds 
0 


48 Linear Combinations, Products, and Ratios 


is the complete elliptical integral of the second kind, and 


a /2 ds 
ro = fe 
0 1l — zsin? s 
is the complete elliptical integral of the first kind. Similar expressions 
for Pr(0 < D < d) as a finite sum of terms can be obtained for any 
positive integer v. Ghosh also provided a tabulation of the numerical 
values of Pr(D < d) for v = 1(1)20 and d = 0.0(0.5)10.0. 
Walker and Saw (1978) expressed the cdf of the linear combinations of 
any number of Student’s t random variables with odd degrees of freedom 
as a mixture of ¢ distributions. Define the linear combination as 


n 
ak 
D = 2 Xy, 
where a, > 0, ai +---+@, = 1 and X, are independent Student’s t 
random variables with degrees of freedom vz = 2m +1, Mmk = 0, 1,2,.... 


Construct a matrix Q whose element in row i and column j is the coef- 
ficient of exp(— | t |) | t |? in ġ2:+1 (t) (see equation (3.1)), that is, 


AN geii ifj=0,1,2. i i= 0,1,2.. 
“ 0 ifj >i. 


The characteristic function of D when vk = 2m, + 1 can be written as 
n 
olt) = E [exp (itD)] = J [ dn (art), 
k=1 


and since exp(| t |)@(¢) is a polynomial in | t |, one may obtain a vector 
A such that 


g(t) = exp(—|t|) SrA elF. 
k=0 


Walker and Saw (1978) showed that the cdf of D can expressed as the 
weighted sum 


S 
Pr(D <d) = >> neH(d), 
k=0 


where 


n? = ATQ, 


8.1 Linear Combinations 49 
n 
ym, 
k=1 
and 


Hild) = Pr [Xora < V2k+ id] 


This result can be used to calculate probabilities of D utiliizing only 
tables of the Student’s ¢ distribution. 

The general distribution of D = X, — Xə is very complicated when 
the X; are independent Student’s ¢ random variables with 4 # v2. It is 
therefore natural to ask whether a reasonably good approximation can 
be found. Chapman (1950) suggested the simple approximation 


d) x 
Pr(D <d) |e/ Jz 3 +]. 


where ®(-) is the cdf of the standard normal distribution. This idea is, of 
course, prompted by the fact that X14 — 2/ yri and Xayr — 2/ fra 
are both asymptotically standard normal random variables, approaching 
normality more rapidly than X; and Xz do. However, a few calculations 
show that this approximation is quite unsatisfactory even for moderately 
large values of vı > v2 > 2. Based on a t-approximation proposed by 
Patil (1965), an improved approximation is 


Pr(D<d) ~ T, (5) 


where T, is the cdf of the Student’s t distribution of v degrees of freedom, 


2 
(22 Ve 4 sin a) 


vo—2 vı—2 
v = 4+ cos4 v2 sin4 v? 
Pa + 1 
(2—2)? (v2 —4) (1-2)7@1-4) 


(where n > v2 > 4), and 


yee cos? v2 $ sin? v 
~ p—2 Wy-2 m-2/)’ 
where v; > v2 > 4. Ghosh (1975) derived another approximation that 
requires only tables of the normal distribution 


= d \ _ dg(d/v2) | Qı(d) ý Q0 + LW 
Pr(D <d) = +(<) PRD [aw cs 2 A 


50 Linear Combinations, Products, and Ratios 


+0(4)} | 


where 
Qi(d) = (14+) (d +10), 
2 
Qo(d) = ae (3d° + 98d* + 620d? + 168) 
+a (d? — 10d* + 36d? — 456) , 
1+)? | 10 8 4 “nag 72 
Q3(d) = TAT (d'° + 66d? + 1016d® — 1296d* — 65328d? — 141408) 
A(L +A) a 20 8 4 2 
ENE — 58d — = 
err. (3d'° — 58d? — 280d* + 6864d* — 700324?) 
1277\(1 +A) 
256’ 


and À = v2/v,. Ghosh showed evidence to suggest that this is far more 
accurate than Patil’s approximation. 

Johnson and Weerahandi (1988) considered linear combinations of 
t random vectors in a Bayesian context. Suppose yi1,...,Yim, and 
Y21,---;Y2m, are independent samples from two p-variate normal pop- 
ulations N(yz,,5,) and N(p3, £2), respectively, where the population 
covariance matrices are unknown and not necessarily equal. Let y; and 
S; denote the corresponding sample mean vectors and sample covari- 
ance matrices. Johnson and Weerahandi considered the distribution of 
the quadratic form 


Q = (6-d)’V"(6-d), 


where ô = pt, — Hs, d = yi — Yo, and V is any p x p positive definite 
matrix. Note that yz; — y; have the central p-variate t distribution with 
covariance matrix S;/(m; — p) and degrees of freedom m; — p. Under 
the diffuse prior distribution 


P(Hy, 21, Mo, 22) = [Dil l[Zal, (3.6) 


the posterior cdf of Q can be expressed as 


FQ = LEM) Fan lag yay} P 


where n = mı +m, —2p and the expectation is taken with respect to the 


3.1 Linear Combinations 51 


beta random variable B, which is distributed as Be((mı — p)/2, (m2 — 
p)/2). The w; are defined in terms of @ (an arbitrary constant) and À; 
by the recursive relation 


where 


and Àj are the ordered eigenvalues À; < +-+- < Ap of the matrix 


l ys, v-2 4—1 


-1/2 -1/2 
mB mo(1 zp Sa 


In the particular case V = cS; + (1 — c)S2, the A; can be conveniently 
obtained by using the relation 


mə(1 — B) +m, BE; 


M4 = GamgB(L— B){e+ (1 — 08)" 


where 1,...,& are the eigenvalues of S7'So. In the univariate case, 
(3.7) reduces to give the posterior cdf of Y = (pı — p2) — (41 — Z2) as 


yVm, — mz — 2 


2 2 
mya) BE m E EA 
m, B mz 1-B 


F(y) = E Tm3+m2-2 


where s? and s2 are the sample variances and the expectation is taken 
with respect to B, which is distributed as Be((m, — 1)/2, (mı — 1)/2). 
The result (3.7) can also be used to deduce the pdf of U = (Tı + 
T2) (Tı T2), where the T; are independent p-variate random vectors 
having the ¢ distribution with covariance matrix (a;/m;)I, and degrees 
of freedom m;. It turns out that the cdf of U is 


u(m,+m2) B(1-B) ie 


F(u) = E |Fpm4m 
= [7 7 ( Pp a, + B (az — a) 


where the expectation is taken with respect to B, which is distributed 


52 Linear Combinations, Products, and Ratios 


as Be(m,/2,m2/2). Johnson and Weerahandi also established several 
interesting bounds on (3.7), one upper bound being 


nq 
ra 


where 8; = V~!/2S,;V~—!/2. Furthermore, it was shown that similar re- 
sults hold if the diffuse prior in (3.6) is replaced by the natural conjugate 
prior distribution. 


Tne 1 


mB it m(l — Bj? 


Fha) < E 


3.2 Products 


The distribution of products of Student’s t random variables has been 
studied by Harter (1951) and Wallgren (1980). 

Products of Student’s ¢ random variables arise naturally in classifi- 
cation problems. In many educational and industrial problems it may 
be necessary to classify persons or objects into one of two categories 
- those fit and those unfit for a particular purpose. In formulating a 
classification problem, assume that for p tests one knows the scores of 
N; individuals known to belong to population II, and of Nọ individuals 
known to belong to population I2, along with those of the individual 
under consideration, a member of the population II, where it is known 
a priori that II is identical with either II, or Iiz. Assume further that. 
the distribution of the test scores of the individuals making up I]; and 
II, are two p-dimensional normal distributions, which possess the same 
covariance matrix but are independent of each other. In order to classify 
the individual in question into either I], or I2, Wald (1944) introduced 
the statistic V given by 


p p 
V = oa ak), whe, (3.8) 


i=1 j=1 


where n = N; + N: — 2, s” is the (i,7)th element of the inverse of the 
matrix S defined by 


1 n 
Sij = = XO Yi kYj,k 
n 
k=1 


and Yi, (i = 1,...,p; k = 1,...,2 + 2) are iid normally distributed 
random variables with unit variances and expected values E(Y;,,) = 0, 
k= 1, OERE E(Y;n+1) = f, and E(Yin+2) = u2. 


3.2 Products 53 
In the particular case p = 1, (3.8) can be written as 


V = nrima ; (3.9) 


where 
1 > 3 
= = Yi vk: 
n 
k=1 


Thus, one sees that the V in (3.9) is a product of two independent 
Student’s t variates. Harter (1951) derived the exact distribution of this 
V. In particular, he showed that the pdf of | V | is given by 

1 


f(lv|) = ey Chen | v | +s+n/2) 


j=0 


if | v |> n/2, and by 
29 (-1)in"/ ti 1)Ín”/2+i 


_ exp ( exp (—43/2) 
fle) = ~ al (n/2) Lata (1+) 


k 

24+ 27 +n\ < (243) 2+n+2j 

2 perce ane et 
xT (ir P (2k)! PUk+ 7 


| v [7 0+i+n/2) 


if | v |< n/2. 

Wallgren (1980) studied the product of two correlated Student’s t vari- 
ates. This was motivated by hypotheses testing problems that assume 
that the relationship between two regression lines y = ai + fiz and 
Yy = az + Box hold for all real x. Although it is often true that such a 
relationship holds for all real x, there are instances in which the relation- 
ship may hold true only for z’s in a given interval, say [c1, c2]. Wallgren 
(1980) showed that the statistic for testing this hypothesis is a product 
of two correlated t-variates; specifically, it takes the form W = XY/S?, 
where (X,Y) has a bivariate normal distribution with means (11, #2), 
common variance g?, and the correlation matrix 


R = (| a (3.10) 


Moreover, 7.97/07 is independent of (X,Y) and has a chi-squared dis- 
tribution with degrees of freedom v. 


54 Linear Combinations, Products, and Ratios 


The limiting distribution of W = XY/S? as v > œ is Z = XY/o?. 
The distribution of Z has been studied by Aroian et al. (1978). The 
study of the distribution of W is, therefore, a generalization of the study 
by Aroian et al. since a? is unknown. . 

In the central case 4; = 0 and u2 = 0, Wallgren (1980) showed that 
the cdf F(w) = Pr(W < w) of W is given by 


F(w) = 1- [ Qu (650, a8 


for -1 <p < 0, w > 0; 
0 
/ Qu (0; p,w) dê 
a-T 


T+a 
Fw) = 1- f Q, (8; p, w) d0 
0 
for 1 > p > 0, w > 0; and 
0 
f enu d0 
Q 


vsinĝsin(0 +4) \"” 
w + vsin ĝ sin (8 + A) , 


/1 — 72 
a= uean (=F), 


p 


for -l<p<0,w <0; 


for 1 > p > 0, w < 0, where 


rQ (65,0) = ( 


and angle A is defined by sin A = 1-p?, cosA=pfor0<A<r. 
The corresponding pdf f(w) can be obtained by differentiation. The 
pdf has a singularity at w = 0 and, considered as a function of p and w, 
f(w; p) = f(—w;—p). The limit of f(w) as p > 17 is the F,,, density. 
Moreover, if pı < p2, then F'(w; pı) > F(w; p2) for any w. 

In the noncentral case, the cdf F (w) is given by 


T a h(s) æL h(s) exp (—v*/2) 


2 oy Nees 
x (ereraa AE æ) duds 


vl- 


8.2 Products 55 


Fa [owe h(s) exp aoe) 


—ws? À À 
xo ee 2) +Ai + pv iuda, 
y1- 
where à; = i/o, i = 1,2 are the noncentrality parameters and 
v¥/*s¥—) exp (—vs?/2) 
2-2)/2P (v/2) 

The two double integrals above are bounded above by ®(A2) and ®(—2g), 
respectively. As a function of 4; and A2, F has the following properties: 


or, equivalently, 


h(s) = 


F (w; M, à2) = F(w;A2,A1), 
F (w; —à1,—à2) = F(w;à, A2), 
F (w; à, —à2) = F(w;—à, A2), 


lim F(wjcd2,r42) = Oifc>0, 
A200 


lim F(w;cr2,A2) = 1/2ifc=0, 


A200 


lim F(w;cd2,A2) = life<90, 


A200 
lim F(w;à, B) = &(-B), 
1700 


and 


lim F(w;\,,B) = &(B). 
1-00 
Also, for 43 > 0, àz > 0, w > 0, and —1 < p < 0, F(w;2j, 2) is a 
strictly decreasing function of A, and A2. Thus, the maximum of F(w) 
over the region A; > 0, à> > 0 occurs at A; = Az = 0. 
Since (X,Y) and S? are independent, the first two moments of W are 
given by 


E(W) = E(XY)E(1/S?) 
and 


E(W?) = £(X*Y’) £(1/S*). 


56 Linear Combinations, Products, and Ratios 
It is known (see, for example, Craig, 1936) that 
E(XY) = oa? (A\A2+/), 


E(X°Y*) = of (AP +A + 4p\rrAo +AA +14 2p’), 
and i 
A vT (v/2—i 
(202) T 02) 
for v > 2X. 


3.3 Ratios 


For a bivariate t random vector (X1, X2) with degrees of freedom v, 
the mean vector (mz,m,) and correlation matrix Iz define the ratio 
W = Xı/X2. The distribution of this ratio is of interest in problems in 
econometrics and ranking and selection. Press (1969) derived the pdf of 
W as 


f(w) = eed ir aan (5) -a}], 


1+w? (py 
=œ < w < œ, (3.11) 


where T,4; denotes the cdf of the univariate Student’s t distribution 
with degrees of freedom v + 1 and kı, k2, q, and qx are constants defined 


by 
—v/2 
1 m2 +m? 
WE TE l 


Y 


_ yav tT ((v + 1)/2) mz, +m? iis 
ae 2T ((v + 2)/2) (1+ :) ' 


MW +My 


T E 


qe = m +m +v-g, 


respectively. In the special case mz = my = 0, (3.11) is the pdf of a 
Cauchy distribution. The asymptotic distribution of W as v — oo is 


and 


3.3 Ratios 57 


that. of the ratio of independent normal random variables. This fact can 
be verified as follows. Let foo(w) = limpo f(w). Then it is easy to 
check that 


li 2 = 

v= (q) t! 2(q) 
mo Tk ai 
v= q* 

lim Fy4i(z) = (2), 
and 
: 1 
Jim kı = epf- (mè +m) b, 


where ¢ and ® are, respectively, the pdf and the cdf of the standard 
normal distribution; thus, the limiting density fə is given by 


1 2@(q) -—1 1, . 
fot) = rr [tt g fee {oa in tm} 
which is the same density found by Marsaglia (1965, page 196, equation 
(5)) for the ratio of independent normal random variables with means 
(Mz, My) and unit variances. 

The density f(w) = f(w;mz,my) may be confined for positive values 
Mz > 0, My > 0 since (3.11) shows that 


f(w;-mz,my) = f(-w;mz,my), 

f (w; Mgr, —My) = f (w; Mz, My) , 
and 

f(w;mz,—my) = f(W; Me, My). 


Figure 3.1 shows how variations in (Mg, My, v) affect the shape of the 
density. It may be seen by reference to Marsaglia (1965) that the shapes 
are similar to that of the ratio of normal random variables, even for 
small values of v. 

The percentage points of W defined by Pr(W < w) = p are tabu- 
lated in Press (1969) for cumulative probabilities p = 0.01, 0.05, 0.10, 
0.90, 0.95, 0.99; v = 1, 2, 5, 10, 30; and for some 16 selected values 
of (mz,my). Here we provide the tables of w for (mz,my) = (0,0), 
(Maz, My) = (1, 3), (mz, My) = (3, 1), and (mz, My) z (3, 3). 


58 Linear Combinations, Products, and Ratios 


maple 


Ratio POF 
2.00 38.165 o.30 
t—Ratic POF 
2.00 O.165 9.30 


42 0 2 i 4 2 0 2 4 
' r 
miminei mimik 


t—Ratio POF 

o.o oa o.a 
tAsuo POF 

o.o 8.4 ob 


4 2 ) 2 4 4 4 l] ? 4 
' Lj 
mimli me myst ud 


t—-Ravo POF 
9.05 ow 
t=Aavo POF 

9.00 O18 


4 a 4 2 4 4 2 t 2 4 
' L 
mimi miy 


-Aano POF 

o.o oa 
t—Rato POF 

o.o oa o.a 


Fig. 3.1. Densities of the t-ratio distribution (3.11) for (mz, my, v) = (0,0,1), 
(0,0,30), (1,3,1), (1, 3,30), (3, 1, 1), (3, 1, 30), (3,3,1), and (3, 3, 30) 


3.8 Ratios 


Percentage points w for (mz, my) = (0,0) 


v | p=0.01 
1 | -31.820 
2 | -31.820 
5 | -31.820 
10 | -31.820 

-31.820 


p=0.05 p=0.1 


-6.314 
-6.314 
-6.314 
-6.314 


-3.078 
-3.078 
-3.078 
-3.078 


3.078 
3.078 
3.078 
3.078 


6.314 
6.314 
6.314 
6.314 


Percentage points w for (Mz, My) = (1,3) 


p = 0.01 
-10.254 
-5.791 
-1.331 
-1.041 
-0.681 


-1.794 
-0.902 
-0.382 
-0.312 
-0.256 


p=0.05 p=0.1 


-0.721 
-0.357 
-0.166 
-0.135 
-0.109 


p=0.9 p=0.95 p= 0.99 
31.820 
31.820 
31.820 
31.820 


p=09 p=0.95 p=0.99 


1.321 
1.120 
1.018 
0.951 
0.922 


2.394 
1.795 
1.486 
1.288 
1.211 


Percentage points w for (mz,my) = (3,1) 


p=0.01 
-51.268 
-58.076 
-64.675 
-67.570 
-69.742 


p = 0.05 
-8.970 
-10.187 
-11.427 
-11.984 
-12.406 


p=0.1 
-3.604 
-4.034 
-4.524 
-4.756 
-4.934 


10.853 
6.842 
6.464 
2.892 
2.350 


p=0.9 p=0.95 p= 0.99 


6.604 
7.337 
8.007 
8.293 
8.505 


11.970 
13.422 
14.774 
15.355 
15.788 


Percentage points w for (mz,my) = (3,3) 


v | p=0.01 
1 | -12.970 
2 -8.172 
5 -2.668 
10 | -0.118 


0.130 


-1.852 
-0.474 
0.229 
0.334 


p=0.05 p=0.1 


-0.442 
0.183 
0.425 
0.477 


2.242 
2.141 
2.038 
1.986 
1.943 


3.652 
3.205 
2.802 
2.627 
2.497 


54.267 
61.290 
67.984 
70.894 
73.074 


p=0.9 p=0.95 p= 0.99 


14.770 
11.028 
7.498 
6.019 


60 Linear Combinations, Products, and Ratios 


Since it is evident from (3.11) that f(w) — 0 rapidly as mẹ and my 
become large, values of mz, My greater than 3 are not considered. 

The t-ratio distribution has one or two modes, depending upon the 
values of the parameters. The location of these modes are solutions of 
the equation 


Te (£vo+i) + Aw) (Lvr%3) = B(w), (3.12) 


* * 


where 
a 2 ee Erer 
SA (qt)? 3qw +m VI + w?’ 
—v/2 
Ba) =< 24 AW) wado Fayre ley} 
w) = a7 2 27T ((v +1)/2) m,V1+w? + 3qw 


—(v+2)/2 


Teram VFM {+a} 
Vat ((v+1)/2)  q* {ma V1 +w? + 3qw} 
x {w 1+w?- qme}, 


and q and q* are as defined above. Note that since T, may be expressed 
in closed form in terms of elementary functions, (3.12) yields the modes 
in terms of elementary functions only. 


From (3.11), 
lim q = -Mz 
wo 
and 
lim gq’ = 4/m2+v. 
w= oo 
Thus, 
lim w?f(w) = Constant. 


woo 


Hence the distribution of W can have no finite moments of order above 
zero. 

Kappenman (1971) extended Press’s ratio distribution for the multi- 
variate case by considering the joint pdf of WT = (X2/X1, X3/X1, ..., 
X,/X1), where X = (X1,...,Xp)? is a p-variate t random vector with 
degrees of freedom v, mean vector yz, and correlation matrix R. The 


3.8 Ratios 61 


expressions for the joint pdf turned out to depend on whether p is odd 
or even. Introduce the following notation 


v? = (1,W7), 


M = V'R""V, 


K = -2V'R yp, 
L = v+p'R"p, 
_ L É 
= M 4M? 
and 
K 
b 2M` 


Then the expressions for f(w) are 


MEN "SX" E e E T 
fw) = 2 a2* pp 
Pl? (Ma)’*?)/? Tw) Z -1-2k 
ca —(v+p)/2 
<f u* {au? +1} v+P)/2 du 
if p is odd; 
2abP-1 |R|! v/T ((v + p)/2 
rice [R| ((v + p)/2) 


nPI? (Ma) PT (y/2) 
p/2 = si 
EEO” 


x f u2k-l {a?u? + ari du 
—b/a 


(p—2)/2 
p-l a\ 2k 
a2 Eor (3) 


k=0 


—b/a s 
x | u2k {a?u? ¥ ap eTa au 
0 


62 Linear Combinations, Products, and Ratios 
if p is even and b < 0; and 


2ab?- RJT! W/T ((v + p)/2) 
aP M(¥+P)/2T (y/2) 


p/2 
p—1 ay 2k-1 
# b a a (5) 
foe) 
«| ykol {aw pa E di 
bja 
(p—2)/2 
p-1 ay 2k 
i 2 a) (5) 
b/a 
«| u fay? Jay O a 
0 


if p is even and b > 0. The integrals in these expressions can easily 
be rewritten in terms of the gamma and incomplete beta functions; see 
Section 3 of Kappenman (1971) for details. 


fw) = 


4 


Bivariate Generalizations and Related 
Distributions 


In this chapter, we shall survey a number of specific bivariate distribu- 
tions that contain Student’s ¢ components. 


4.1 OQwen’s Noncentral Bivariate t Distribution 


Let Yı and Y, have the bivariate normal distribution with zero means, 
unit variances, and correlation 1. Let vS? have the chi-squared distri- 
bution with degrees of freedom v and be independent of the X’s. Then 
Xı = (Yı +61)/S and Xə = (Y2 + ĝ2)/S have the noncentral univariate 
t distributions with degrees of freedom v and noncentrality parameters 
6, and 69, respectively. Owen (1965) studied the joint distribution of 
(X1, X2), a noncentral bivariate distribution. 
The marginal cdf of X;, j = 1,2 may be written as 


V 2T oe 


Pr(x; < E A 
r( j <y) T(v/2)2¢-2)/2 ô 


a’! g(x) d (= n ) dz, 
(4.1) 


where ø and © are, respectively, the pdf and the cdf of the standard 
normal distribution. Integrating by parts repeatedly, one obtains for 
odd values of v 


Pr(Xj<y) = ®(-6,VB) +27 (4;VB,A) 
+2[M, + M3 +--+ + Mv] 


and for even values of v 
Pr(X; <y) = &(-6;) + V2 [Mo + Mo +--+ Mv-2]. 


63 


64 Bivariate Generalizations and Related Distributions 


Here, 


y+y?’ 
1 Pæt (+r? )h?/2} gz 
Qn Let ge 


(a function discussed and tabulated in Section 46.4 of Kotz et al., 2000), 
and the M’s are defined recursively by 


T(h,a) = (4.2) 


Mo = AVBY (3;VB) & (5;AvB), 


A 
M = B {5AM + A (6))\, 


Mz = Z {6; AM: + Mo} ; 


M = z [ô AM2 + Mı}, 


and 


-—1)B 
Mp = Ga38 {a,6;AMy_1 + Mxg-2} ; k > 4, 


where a, = 1/((k — 2)az_1), k > 2, and az = 1. Two special cases of 
(4.1) are 


Pr(X;<0) = ®(-6,) 
and 
Pr(X; <1) = 1-8(4), y=1. 
Also, if 6; = 0, then (4.1) is just the cdf of the Student’s ¢ distribution. 


Owen (1965) expressed the joint cdf of (X1, X2) in terms of functions 
of the form 


wind = ag | S eoe (4-0) a 


4.1 Owen’s Noncentral Bivariate t Distribution 65 


and 


Q(y,6;R,0o) = RID ronan | [= ”-lo(x) ja (4-5) ae. 


For example, if one assumes, without loss of generality, that yı > y2, 
then the following relations hold 


Pr(Xi < y, X2 <y2) = Q(y1,61;0, R) +Q (ye, 52; R, 00) 
and 
Pr(Xi < yi, X2 < y2) = Pr (Xo < y2), 


for 6, > 62 and ĝı < 69, respectively. The formulas for Q(y, ô; 0, R) and 
Q(y, 6; R, œ) can be obtained by integration by parts. Since Q(y, ô; 0, R)+ 
Q(y, 6; R, 00) = Pr(X; < y), it is sufficient to know the formula for one 
of the Q terms. Owen obtained the following formulas for Q(y, 6; 0, R) 
for odd and even values of v, respectively 


Q(y,6;0,R) = &(R)—2T(R,(AR — ô)/R) 
-2T (5VB, (5AB — R) /B5) + 27 (5VB, A) 
+1 {5 <0}-1+2{ My +H, + Mj + Hs 
30 Me oe A -2} 
and 
Qly, 5;0,R) = ®(-6)+ v2n{ Mg + Ho + Mi + H 
oh M? 3+ H,-2}. 


Here, T(h, a) is as defined in (4.2) and the M*’s are defined recursively 
by 


Mj = AvBo(5VB) {8 (54vB) - © (54B - R)/ VB) }, 


Mi = B[sAM; + Ao (VB) {4 (sAvB) 
-¢ ((5AB - R)/ VB) }], 


B 
My = 3 {AM} +M} - Li, 


66 Bivariate Generalizations and Related Distributions 


k-1)B 
M; = C-IE Car AM; a + Mi-a} - Det, k > 3, 


and the H’s are defined recursively by 


Ho = —$(R)®(AR - ô) 
and 
Hy = ak+2RHk1, k>1, 
where 
Lp-1 = ak+2RLk-2, k>3 


with the initial value Lı = (ABR/2)¢(R)¢(AR — ô) and 


pee eee 
(k — 2)ak-1 ° 


with the initial values a; = a2 = 1. 


ar k>3 


4.2 Siddiqui’s Noncentral Bivariate t Distribution 


Siddiqui (1967) considered the joint distribution of Student’s ¢ variates 
when (Yii, Y2i), i = 1,...,N is a random sample from the bivariate 
normal distribution with zero means, unit variances, and correlation 
coefficient p. Let 


7 te 
m= gA Yia 
i=l 
J ee 
n = ah 
i=1 
2 1 ` z)? 
Si = Jah) , 
i=1 


4.2 Siddiqui’s Noncentral Bivariate t Distribution 67 


and 
i S E = 
gee X (Yu - MN) (Yaa -¥2). 


i=l 
The interest is in the joint distribution of the Student’s t variates 
ý Yo ) 
X, X) = ( ——, —— ]. 4.3 
ee) Ta VN — 18, oe 


It is well known that the joint pdf of (Yi, Yo, S1, S2, R) is 


f (Hi, G2; $1, 82, r) 
NY (si82)¥? (1 - r2) 9” a fz +s 
T(N — 2) (1 = p) ~P P|- 20 =p) 


+9” + s2 — 2p (9 +175) 52) | 


for —00 < Jı < œ, —00 < z < 00, 0 < sı < œ, 0 < s2 < œ, and 

—1 < r < 1 (see, for example, Kendall and Stuart, 1958). After suitable 

transformation, Siddiqui obtained the joint pdf of (X1, X2, R) in the 

form 

rw +2) (1- pyre (fs r2) 072/2 
(2n)3/2T (v + 3/2) (1 — b — cr)” t! 


2 2 —(v+1)/2 
«{0+2) +3) 
Vv V 


11 3 1+b+er 
x 20; (pprt yoy), (4.4) 


f (a1, 22,7) 


where v = N — 1, oF; is the Gauss hypergeometric function, l 


PT1T2 


z? T2’ 
Vay Beaty da 


fee es 
x? z2 
yore NEE 


It is easily seen that the limit of this joint pdf as v > oo is trivariate 
normal with independent components (X1, X2) and R. Integrating out 


b= 


and 


68 Bivariate Generalizations and Related Distributions 


r in (4.4), Siddiqui showed further that the asymptotic joint pdf of 
(X1, X2) as v = œ becomes 
Tr(v + 2) (1- yore 


f (1, 22) ~ “Soro ee 


2 2 —(v+1)/2 
«{0+2) +2) 
v 


11 31-B+e 
F =; =). 

Sa (Gs PtP Ao ) 
The exact joint pdf of (X1, X2) was also given for the two special cases 
v = l and v = 3. For v = 1, the joint pdf reduces to a bivariate Cauchy 
distribution 

(1 — p?) csc? 8 T 

‘ = OO +- tO>, (4. 
an) = ETB Trapt + (5-6) cote}, (4.5) 


where 
2p (1 — p°) (1+ yrye) 
JVlty?J/1+y2 


In fact, if p = 0 in (4.5), then one arrives at a product of two independent 
Cauchy densities. For v = 3 the joint pdf is 


32v2(1—9?)" eN (BY 
f(t1,22) = (+ uy (+2) Is, 


cos@ = 


where 
T'(9/2)0'?(k + 1/2) l fk 
-È P(k +9/2)r?(1/2)k! & S 5-21 mai (- z) (7) 


x fa -b — c) 5/2 —(1-b+ e l 


4.3 Patil and Liao’s Noncentral Bivariate t Distribution 


Patil and Liao (1970) provided an extension of Siddiqui’s work when 
(Yii Yo:), i = 1,..., N is a random sample from the bivariate normal 
distribution with zero means, common variance o”, and correlation co- 
efficient p. Instead of considering the joint distribution of (4.3), they 
considered the joint distribution of 


JNY, | 


(X1,X2) = (F 3 (4.6) 


4.4 Krishnan’s Noncentral Bivariate t Distribution 69 


where S is the pooled sample standard deviation defined by 


aes (Yi = ñ)? + pane (Y2: z P)’ 

2N -1 f 
Exact expressions were given for the joint pdf and cdf of (X1, X2) and 
for the corresponding marginal distributions. For instance, if N is odd 
and equal to 2q + 3, then the joint pdf of (X1, X2) can be represented 
in the form 


f (a1, 22) : 


Beat a(q + 1) (1p?) T2(q + 1) 


x 5 (o) (-1)9-* eee T (2q -k +1) 


p 


k 
2_9 2) ~(k+2) 
2(1+p) 8(q+1) (1-9?) 
2 


q—k l 
p T(k+142) 1 
7 2 (27) l! ES 


-(k+1+2) 
y? — 2pyrye + y? | 


*8q+ (1-2) 


while if N is even, then the joint pdf of (X1, X2) becomes 
(1 +o- 


fns) = NaN a -anD 
= 1— N)/2 : 
E(w ero fha} 
1 Ta o 
2(1+p) 4(q-1)(1-p?) 


4.4 Krishnan’s Noncentral Bivariate t Distribution 


Krishnan (1972) provided another extension of Siddiqui’s work when 
(Yii, Yoi), i = 1,...,N is a random sample from the bivariate normal 
distribution with means (y, ô), unit variances, and correlation coefficient 
p- She considered the joint distribution of 

VNI VN- vi) 


(4.7) 


a al ( Sı i S2 


70 Bivariate Generalizations and Related Distributions 


where Sı and Sp are correlated chi-squared random variables indepen- 
dent from Y, and Yo, respectively. Series representations were derived 
for the joint pdf and the cdf of (X1, X2). One of the representations 
given for the joint pdf is 


_ B o0 i a z? —(N+j)/2 
feue = Bafe (is a) 0+ 


j=0 


x dm,j» (4.8) 


where 
2(1—p?)’ 


E / A3 NN 24-N p3-N j A 
B = A t-A +ô — 2p7ô) }, 


a Vk (2pAz1£2)7 
(N = DG = k)! 


PE D o a J AEII (N +k = 1)/2)}, if k is even, 
f if k is odd, 


O (2f (2A) (NN +i+j N+j+m-i 
tos = Do eee a S 


VA (pô — 7) a1 
JN -1+zr? 
and 


VA (py — ô) 22 
JN —14+23— 


In the central case y = 6 = 0, (4.8) reduces to the form derived by Patil 
and Liao (1970). 


4.5 Krishnan’s Doubly Noncentral Bivariate t Distribution 71 


4.5 Krishnan’s Doubly Noncentral Bivariate t Distribution 


If Y is a normal random variable with mean ô and unit variance, and S? 
is an independent noncentral chi-squared random variable with degrees 
of freedom v and noncentrality parameter À, then 
Y yv 

X = $ (4.9) 
is said to have the doubly noncentral univariate ¢ distribution with de- 
grees of freedom v and noncentrality parameters 6 and À. The properties 
of this distribution have been studied by several authors; see Robbins 
(1948), Patnaik (1955), Krishnan (1959), Krishnan (1967a), Bulgren and 
Amos (1968), and Krishnan (1968) — see also Chapter 31 in Johnson et 
al. (1995) for a summary. The pdf, the expectation, and the variance of 
X are given by 


e ee M exp {— (A + 6?) /2} 
fla) = DD. EU) Be U2 + 72,0) +B) 


ôr \'! g2 \ T(t +2k+1)/2 
x | = 1+— , 
(æ) (+7) 


k=0 1 


and 


Var(X) = (1+8?) — iF, (1.5:-3) = {ECO}, y>2, 


where F; denotes the confluent hypergeometric function. 

A bivariate analog of (4.9) was defined by Krishnan (1970) as follows. 
Let (Y1, Y2) follow a bivariate normal distribution with zero means, unit 
variances, and correlation coefficient p. Let (S1, S2) follow independently 
a noncentral bivariate chi-squared distribution with degrees of freedom 
v, noncentrality parameter À, and correlation coefficient p (Krishnan, 
1967b). Then the random vector 


(4.10) 


is said to have the doubly noncentral bivariate ¢ distribution with degrees 
of freedom v and noncentrality parameter À. Krishnan (1970) derived 


72 Bivariate Generalizations and Related Distributions 


the corresponding joint pdf of (X,, X2) and provided an application in- 
volving the sample means and variances from two correlated nonhomo- 
geneous normal populations. The special case of (4.10) for Sı = S2 = S 
was considered by Patil and Kovner (1969), who provided expressions 
for the joint cdf of (X1, X2) and showed that when the means of Y; 
are zero the probabilities of (X1, X2) in rectangular regions are mono- 
tone functions of p. In the special case Sı = Sp = S and à = 0, the 
distribution of (X1, X2) reduces to that of the central bivariate t. 


4.6 Bulgren et al.’s Bivariate t Distribution 
Suppose Yj,..., Ym, Ym+1,---) Ym+n denote iid normal random variables 
with common mean p and common variance o?. Bulgren et al. (1974) 
considered the joint distribution of (X1, X2) defined by 


mY, J/m + nY 


ey VSP? , [m= SF4(n-1) 2 
m+n—2 


where 


and 


oe SS (yey). 


The distribution of (Xi, X2) is bivariate ¢ with a different noncentrality 
parameter for each variable. Note also that X, and Xə have, respec- 
tively, m — 1 and m + n — 2 degrees of freedom. Bulgren et al. (1974) 


4.7 Siotani’s Noncentral Bivariate t Distribution 73 


provided series ‘representations for the joint pdf of (X1, X2). In the 
central case u = 0, 


g(m+n)/2 z2 —(m+n)/2 
fena) = “To (14+ 5) 
“(-1) (mtn. x2 z 
xy ji T 9 +j + aa 
j=0 
BS pees k/2 
2j m+k n-1 n+m 
D a (Cea) 
k=0 
2j—-k 
xak -vmt , (4.11) 
n(m +n — 2) 
where 
A = Xmm (mma Ulm +n—2), (mal), (mal) 
m+n 2 2 


In the noncentral case u Æ 0, the joint pdf is even more complicated. 
Letting n = am, a > 0, and taking m — oo in (4.11), one observes that 
the limiting distribution in the central case is the bivariate normal dis- 
tribution with zero means, unit variances, and correlation y 1/(1 + a). 


4.7 Siotani’s Noncentral Bivariate t Distribution 


Siotani (1976) considered the most general form of (4.6) introduced by 
Patil and Liao (1970). Let Y be a p-variate normal random vector 
with mean vector yz, unit variances, and correlation matrix R. Let 
S = J(V? + V¥)/(2v), where (Yi, V2) has the central bivariate chi- 
squared distribution with degrees of freedom v and correlation coefficient 
T. Siotani derived the distribution of X = Y/S for general p and R. The 
derivation required the joint pdf of (Vi, V2) that was given by Siotani 
(1959) in the form 


os) vz) aa 1 1 
f (v1, v2) Bee e TE (CEST) 
exp d-n seus l ; 
2(1-7?) 
where 
T ((v + 2k)/2) 


ck(T) EDN (1- py? ek (4.12) 


74 Bivariate Generalizations and Related Distributions 


From this, one can easily obtain the pdf of 


We S _ Y+V 
VIS Vj 2v(1—7?) 
as 
= Dalry T) fov+4an(w), (4.13) 
where 
2a)" encag = 
fov+4k(w) Pep 2u+4k—-1 exp {-vw?} ë (4.14) 


Since c (7) +++++0¢ (7) = 1, (4.13) is a mixture of (4.14) with the 
weights given by (4.12). Thus the joint pdf of X is also obtained in the 
same form 


f(x) = J lT) Sia (x), 
k=0 


where c(T) are given by (4.12) and 


} I (v + 2k + p/2) (1 —7)?/? 
(2vr)P/T (v + 2k) |R|? 


—(v+2k+p/2) 
x™R x} 


kk 1 
fpa (X) = ex {-5 A 
1 
2v 


H 
xfi 
T (v +2k + (p+l)/2) 
l= 


. 2 ID (v + 2k + p/2) 


k 
3 2(1-7?)xTR tu 
Qv + (1—7?)xTR-x | 
When p = 2, p = 0, and p = 7 (p is the correlation coefficient between 
^ and Y2) this coincides with the pdfs derived by Patil and Liao (1970). 


4.8 Tiku and Kambo’s Bivariate ¢ Distribution 


Suppose (X1, X2) has the bivariate normal distribution with means 
(mı, #2), variances (o?, g2), and correlation coefficient p. Its joint pdf 
can be factorized as 


f(t1,%2) = f (1 | 22) f (22), 


4.8 Tiku and Kambo’s Bivariate t Distribution 75 


where 
f (zı | z2) = aS |- 7? afa — fh 
a (z2 — pro)? H (4.15) 
and 
fa) x epf- e-m). (4.16) 


Numerous nonnormal distributions can be generated by replacing either 
f(z. | £2) and/or f(z2) by nonnormal distributions. Tiku and Kambo 
(1992) studied the family of symmetric bivariate distributions obtained 
by replacing (4.16) by the Student’s t density 


f(r) x —— 1 Ea (4.17) 
s vVkoz kos l 


where k = 2v — 3 and v > 2. This is motivated by the fact that in many 
applications it is reasonable to assume that the difference Y; — pı — 
p(o,/o2)(Yo — u2) is normally distributed and the regression of Y, on 
Y is linear. Moreover, in numerous applications Yz represents time-to- 
failure with a distribution (Tiku and Gill, 1989; Gill et al., 1990), which 
might be symmetric but is not normally distributed. Besides, most 
types of time-to-failure data are such that a transformation cannot be 
performed to impose normality on the underlying distribution (see, for 
example, Mann, 1982, page 262). 
On replacing (4.16) by (4.17), the joint pdf of (X1, X2) becomes 


2) TY 
f(e,e2) = eras | 


0102.\/k (1 — p°) ko? 
xex -yai — py — T (z — m)? 
P 202 (1 — p) 17M Pos 2— H2 . 
(4.18) 


Limiting v —> 00, (4.18) reduces to the bivariate normal pdf. Writing 
lij = E((Yi — m)Ż (Y2 — u2)Ż) for the cross product moment of order 
i + j, one observes that all odd-order moments are zero and that the 
first few even-order moments are 


76 Bivariate Generalizations and Related Distributions 


Hil = p002, 
Ho = oz, 
Ho = 304 (1+ 2 5) ; 
u31 = 3p0?oz (1 + a :) j 
uz = ooz (1 + 2p? + we) i 


2 
p13 = 3po103 (+z + E ra) 


and 
_ 3(2v— 3) 4 
Ha = pap o. 


In fact, the moment generating function (mgf) of (Y1, Y2) is given by 


t= 2 2 
E [exp (0,X1 + 62X2)} = exp (m - Pr pa) a+ Cig) 


M2 (+. + Pt agn) , 
02 


where M2(-) denotes the moment generating function of X2. This mo- 
ment generating function does not exist unless, of course, v = co. How- 
ever, the characteristic function does exist and is given by Sutradhar 
(1986). Estimation issues of the distribution (4.18) are discussed in Sec- 
tion 10.1. 


4.9 Conditionally Specified Bivariate ¢ Distribution 


Let (X,Y) be a continuous random vector with joint pdf fx y (x,y) over 


R?. Let fx (x), fy (y) and fxjy(z|y), fyıx(y | z) denote the associated 
marginal and conditional densities, respectively. Assume that X | Y and 
Y | X are Student’s t-distributed with the pdfs 


fxiy(z|y) = a rT voly) {1+ of (y)a2p OTN? (4.19) 


4.9 Conditionally Specified Bivariate t Distribution 77 


and 


faxu) = OD Vata) (1+ rayne} OY” (4.20) 


where x E€ R, y € R, v > 0, and a(y), T(x) are some positive functions. 
Writing the joint pdf of (X,Y) as a product of marginal and conditional 
densities in both possible manners, one obtains 


fy (y)V oly) {1 + oly) y tt 


= fx(£)y T(z) {1 + r(z)y2 y t, (4.21) 
where z € R and y € R. Set 
/(v+1 v+l1 
atv) = {fre}, ha = [flere (4.22) 
so that, after rearranging, (4.21) becomes 
gly) +97 9(y)t(z) — h(x) —27h(z)o(y) = 0, (4.23) 


which must be solved for ø, T, g, and h. Kottas et al. (1999) recognized 
that (4.23) is a special case of the functional equation 


nr 
So felz)ge(y) = 9, 
k=1 
whose most general solution is given in the classical book by Aczel (1966, 


page 161). Thus, with h(x), z?hk(z) and g(y), y?g(y) being the systems 
of mutually independent functions, the solution of (4.23) is found to be 


2 Ag + Naz? Ag + Aay 
(z) ~ Ar a AoT?’ o(y) a `i +4 Agy2 (4.24) 
and 
1 1 
(z) a Ài Ex dor?’ gly) Ba `i + ay (4.25) 


for 4; E€ R, j = 1,2,3,4. Finally, substituting (4.22), (4.24), and (4.25) 
into (4.21), the joint pdf is derived as 


fx xy(z,y) = N(A1, 2, As, Aa) 


x {Ar + doz? + dsy? + Agsrye yer? 


, (4.26) 
where z € R, y E€ R, and N,(-) denotes the normalizing constant. Uti- 
lizing certain compatibility conditions given in Arnold and Press (1989), 
Kottas et al. found that (4.26) is a well-defined joint pdf if A; € Ry U{0} 


78 Bivariate Generalizations and Related Distributions 


and A; E€ R4}, j = 2,3,4. Moreover, if A; = 0, then one must have 
v € (0,1). 
The normalizing constant is given by the integral 
1 
N, (ài, À2, A3; Ma) 


= / f (Ar + Age? + Agy? + Agn?y?) CP dedy. (4.27) 


In the case \; # 0, making the transformation s = (à2/à1)z?, t = 
(A3/A1)y?, letting ¢ = ,Aq/(AgA3), and using the integral representa- 
tion of the Beta function, 


B(a,b) = I z711 + 2)~*'dz, a>0, b>0, 
0 


one obtains 


1 B(33) [7 dz 


Nv(A1,A2,A3,A4) MY-DI2TXIX3 Jo (1+ 2)"/2/a(1 + gz) 


Letting w = «/(1+ s) and manipulating, Kottas et al. obtained 
N, (Ai, A2, A3, Aa) = 
where 
1 
I(a,b,ce;2) = I we (1—w)e (1 — zw)™°dw (4.29) 
0 


for c > b > 0. In the case à; = 0, similar arguments show that 


p/2\(l-v)/2 
N,(0, A2,A3,A4) = MAVA 
B (3,3) B (15%: 3) 
where 0 < v < 1. The integral (4.29) converges for z < 1. For | z |> 1, 
Kottas et al. provided an alternative representation of (4.28) in terms 
of the Gauss hypergeometric function (see, for example, Magnus et al., 
1966, page 54). It is also possible to represent (4.28) in terms of elliptical 
integrals of the first and second kind (Carlson, 1977, Chapter 9). For 
example, if v = 1, then (4.28) can be easily rearranged to yield 


VAIN 


Nz (Az, A2, A3,A = =, 
Ainanao V2aRr (0, 1/¢, 1) 


4.9 Conditionally Specified Bivariate t Distribution 79 


where Rp is the elliptical integral of the first kind defined by 
1 CO 
Rr(a,b,c) = J f {(z + a)(z +b)(z + oy? dz 
0 


with a, b, c nonnegative, and at most one of them equal to zero. 

If 0 < v < 1, then (4.26) does not possess finite moments; thus, 
from here on we shall consider the case v > 1. If v > max(m,mn), for 
non-negative integers m, n, then Kottas et al. showed that 

m+1 vr=m m+) n+l vl. 
pees) = e ae e 
mur HET (3) T (3,91 - 4) 
provided that both m and n are even or zero. The expectation is zero if 
at least one of m or n is odd. This suggests that the distribution may 
be an appropriate model for uncorrelated but nonindependent data. 

From relations (4.21), (4.24), and (4.26) it is immediate that the 

marginal densities are 


fx(x) = vin {I (pa -0) V1 + oj 22(1 + ma) 


and 
11lv+l, ; ey ie 
fru) = Veal Siok z lT? vV 1+ dpoy?(1 + pzy“) ; 


where z € R and y € R. Here, u; = Aj41/A1, j = 1,2 are the intensity 
parameters while ġ and v are the dependence and scale parameters, 
respectively. It is easily noted that X and Y are independent if and 
only if ¢ = 1. The graph of the joint pdf is symmetric and bell-shaped 
and takes the standard form when 4, = w = 1. 

From relations (4.19)—(4.20) and (4.24)-(4.25) it is immediate that 
X | Y has the Student’s ¢ distribution with degrees of freedom v and 
scale parameter (1/)(1 + p2y”)/(1 + ġu2y?), and that Y | X is also 
Student’s ¢ with degrees of freedom v and scale parameter (1/v2)(1 + 
1 2")/(1 + oj 2), where u; = Aj41/A1, j = 1,2. Consequently, the 
conditional moments are 


-1 


ities cE Ef aie e 
BU Oey) = are Val ae 
and 
amane DES iy ae NOY 
OCA ih 2 xray (mares) 


80 Bivariate Generalizations and Related Distributions 


provided that m is an even number less than v. If m is odd, then the 
corresponding conditional moments are zero. 

In the special case v = 1, (4.26) reduces to the centered Cauchy condi- 
tionals model of Anderson and Arnold (1991). The limiting case ¢ > 0 
gives the bivariate Pearson type VII distribution (Johnson, 1987, page 
117) with location parameters equal to zero and uncorrelated compo- 
nents. If 2v is a positive integer, then this limit distribution reduces 
to a special case of the general bivariate ¢ distribution (see, for exam- 
ple, Johnson and Kotz, 1972, page 134, relation 1) with uncorrelated 
components and 2v degrees of freedom. For pı = fig and v = 2, the 
limit distribution reduces to the bivariate Cauchy distribution (Mardia, 
1970a, page 86) while for pı = p2 and v = v +1 it gives the bivariate 
t distribution (Johnson and Kotz, 1972, page 134, relation 2) with v 
degrees of freedom. In the latter case, the standard bivariate normal 
distribution with independent components arises as a further limiting 
case when v —> oo. Other special cases of (4.26) are the centered normal 
conditionals model studied by Sarabia (1995) and the Beta conditionals 
model of the second kind (Castillo and Sarabia, 1990). 


4.10 Jones’ Bivariate t Distribution 


Let Z,, Z2, W be mutually independent random variables with Z; hav- 
ing the standard normal distribution and W having the chi-squared dis- 
tribution with degrees of freedom nı. Then the standard bivariate t 
distribution with degrees of freedom n, is the joint distribution of 


(F ve (4.31) 
One disadvantage of this model is that the two univariate marginals 
(which are Student’s t) have the same degrees of freedom parameter 
and hence the same amount of tailweight. Jones (2002b) provided an 
alternative distribution with Student’s £ marginals, each with its own 
arbitrary degrees of freedom parameter. Precisely, if Wi, W3 are in- 
dependent chi-squared random variables (also independent of Z1, Z2) 
with degrees of freedom vı and v2 — 4, respectively, then Jones (2002b) 
considered the joint distribution of 


_ (Wnt vireo 
ae) <= (“See ieee) oe 


Note that the ith marginal of this distribution is Student’s t with degrees 
of freedom v;. It is easy to see that the correlation between X, and X2 


4.10 Jones’ Bivariate t Distribution 81 
is zero, a property also shared by (4.31). If rı < vı and rı + 1r2 < n2, 
then the product moment is given by 
yp Pur (H) r (ES) r (45%) r (a=) 


aD (3) 0 (277) 


E(X X3’) 


if rı and rz are even and is zero otherwise. The joint pdf of X, and X> 


2 2\ —(l1+v2/2) A 
f (z1, 22) = C 14% Z2 F Pa “i, 
vı V2 2 9 
2 2 2 
neigia) (4.33) 
2 vi nı V2 


where 
Re ris) 
vanr (3) T (2) 
and 2 denotes the Gauss hypergeometric function. The conditional 
pdf of X2 given X; = z1 is 


C= - 
T 


z2 


f (z2 | z1) = c(u+2 


Vg h — n lt+yv uy—1 


yer" 


2 2 2 2 ut 23/2 
(4.34) 
where u, = 1+ 2?/™ and 
(+ 2 py 
ems aia e. 
TT (152) 
If v2 + 1 >r, then the conditional rth moment is given by 
£ =V: Vv T ltr r ity = 
E (xs | Xı = zı) = v3 ulf 2+ o ECT be mal 
Var (5%) 
EF V —r+1 v—n, 1+ u -—1 
241 2 , 2 , 2 , Uy 
(4.35) 
if r is even and is zero otherwise. Setting v2 = vı = v in (4.34)- 


(4.35), one obtains the corresponding forms for the standard bivariate 
t distribution (see Section 1.11). Note that the conditional variance 


82 Bivariate Generalizations and Related Distributions 
Var(X2 | Xı = 21) increases with | zı |. In a parallel fashion, with 
uz = 1 + 23/v2, the conditional distribution of X, given X, = z2 is 


2\ —(+/2) 
f(a, |.) = c(u+2) 
1 


Vy hn 1lt+yv ug 
«oF (14 2 V2 1, Qe 2 J 


g go> ag uz + £2 /1 

where 

ure) p (a)r (2) (1+ 4) 
vnr (3) r (52) 

This time, the conditional rth moments exist provided vı > r, unless 

vı = V2, in which case one needs 1 + vı > r. The odd conditional 


moments are again zero and the even conditional moments are given by 
the simpler form 


C = 


T (+47) T (22) T (22) T (=t) 
— — _t/2,7/2 2 2 2 2 

E(Xj|X2.=%2) = vu Vat (4) 2 (282) r (45) 

The construction (4.32) can be easily extended to the multivariate 


case. Two straightforward extensions are 


e Let Z,...,Z), Wi,...,W, be mutually independent random vari- 
ables with Z; having the standard normal distribution and W; having 
the chi-squared distribution with degrees of freedom v; — v;-,. Then, 


G aena 


Wi > J/Wi+W2 YW t Wp ; 


has a multivariate distribution with univariate marginals that are Stu- 
dent’s ¢ distributed with degrees of freedom v;, i = 1,...,p. The 
bivariate marginals of (4.36) have the distribution of (4.32). 

e With the notation as above, 


(Xi X3,- T ,Xp) 


_ [vZ yr Yp Zp (4.37) 
W, YW +02” JW, +U, l 


has a multivariate distribution with the same univariate and bivariate 
marginals. Here, U; are independent chi-squared random variables 
(also independent of Z;, W;) with degrees of freedom v; — 4, i = 


laser D: 


4.10 Jones’ Bivariate t Distribution 83 


(a) (b) 


x2 


Fig. 4.1. Jones’ bivariate skew t pdf (4.33) for (a) vı = 2 and v2 = 3; and (b) 
vı = 2 and v2 = 20 


Further extensions of (4.32) arise by adding further independent chi- 
squared random variables inside the square roots in the denominators 
of all variables in (4.36) or by adding a single further independent chi- 
squared random variable inside the square root in the denominator of 

1 in (4.37). 

Jones (2002a) provided another bivariate generalization of (4.31). This 
generalization has the skew ¢ distribution (Jones, 2001a) as its marginals. 
If U denotes a standard beta random variable with parameters a and c, 
then a skew ¢ variate is defined by 


Va+ c(2U — 1) 
a/U0—0) 


X 


84 Bivariate Generalizations and Related Distributions 
The corresponding pdf is 

210c (a + ¢) z anh? 
ee + ay 

Ja + (a)r (e) l vVa+c+r? } 


m c+1/2 
x {1 + ===} x (4.38) 


The standard Student’s t is the particular case for v = 2a when a = c. 
If a > c, then (4.38) has positive skewness; also, f(z;b,a) = f(—z;a, b). 
Further details about (4.38) are given in Jones and Faddy (2002). The 
bivariate generalization proposed in Jones (2002a) is constructed in the 
same way as (4.38): Specifically, if (U,V) denotes a Dirichlet random 
vector with the joint pdf 

T(a+b+c) 
TORORON 


(where u > 0, v > 0, and u +v < 1), then define 


(XX) = Vd(2U -1) Vd(1-2V) (4.39) 
ae 2/U0—U)’ 2U -U)) 


where d =a +b +c. It can be verified that the joint pdf of (X1, X2) is 


f(z;a,c) 


a Tee 


fluv) = 


l1- u-v)! 


a—l 
= dd+1) z 
feon) = ETOO (1+ a) 


6-1 
ately on ES FA 
Jd+22 (d+ 2)?’ (d+ z2)?” 


c—1 

T2 Ti 
x ( Jaa ais) ; (4.40) 
Because of a direct analogy with the Dirichlet distribution, only one of 
the two marginals of (4.40) can be a symmetric Student’s ¢ distribution, 
the other necessarily being skewed. This Student’s ¢ distribution will 
have degrees of freedom d, and any skew ¢ marginal will have a total 
parameter value of d, but divided up into unequal amounts. In this 
sense, marginals of (4.40) are most closely associated with Student’s ¢ 
distributions with degrees of freedom d. 

Note that if instead of (X1, X2) in (4.39) the transformation was made 
to (—X1, X2), then one would have obtained the equivalent distribution 
on zi +22 > 0. Also, (-X,,-—X2) would have given the equivalent 
distribution on z} < zı and (X1, —X2) as the same on z2 + z; < 0. 


4.10 Jones’ Bivariate t Distribution 85 


(a) (b) 


0 


x2 
x2 
-8 -4 


-14 


x2 
x2 


02468 


xi x1 


Fig. 4.2. Jones’ bivariate skew t pdf (4.40) for (a) a = 1, and c 
a = 3, b = 4, and c = 5; (c) a = 5, b = 1, and c = 1; and (d)a = 1, b 
c=1 


The corresponding changes to (4.40) would simply have been to make 
corresponding changes to the signs of zı and z2. 

The means and variances associated with (4.40) can be easily obtained 
from the results provided in Jones (2001a) 


„~ _ Vd T (a—1/2)T (b +c- 1/2) 
BAD Eeg nR Tae +c) i 
_ va T (a+c-1/2)T (b -— 1/2) 
E(X2) = g Obeo aro 


_ d(a-b-c)?+d-2 
Var(Xı) = Tue- jery EWF. 


86 Bivariate Generalizations and Related Distributions 


and 


d(a—b+c)*+d-2 2 
Var(X2) = =~ —_— - {E(X : 
ar CG) 4 (a+e—1)(6—-1) {E (X2)} 
The covariance between X; and Xz appears not to be available in closed 
form. 


5 


Multivariate Generalizations and Related 
Distributions 


This chapter contains a large number of modifications and extensions 
of the standard multivariate t distribution introduced in (1.1). Some of 
them are of somewhat complex nature. It thus requires a careful reading 
to see the forest, behind the trees! 


5.1 Kshirsagar’s Noncentral Multivariate t Distribution 


One of the earliest results in the area of noncentral multivariate ¢ distri- 
butions is that due to Kshirsagar (1961). Let Y be a p-variate random 
vector having the normal distribution with mean vector 4, common vari- 
ance o”, and correlation matrix R. Let S? be distributed independently 
of Y according to a chi-squared distribution with degrees of freedom v. 
Kshirsagar (1961) considered the distribution of X = Y/S and showed 
that it has the joint pdf 


2 af tet pe) Ew 0)/2) 


1 —(v+p)/2 
x fı + Lerx} 
v 


ELU trto | VRE i ‘et 
k=0 kT ((v+p)/2) | vv +xTR x] oo” 


where € = y/o. This noncentral distribution reduces to the form of 
(1.1) when p = 0. 

We noted earlier in Section 1.11 that, if X has the central multivariate 
t distribution with degrees of freedom v and correlation matrix R, and 


87 


88 Multivariate Generalizations and Related Distributions 


and 


where X; is pı x 1 and Ry is pı X pı, then 
Zı =X 


-1/2 
v+ z 
Z: = (1 + EXTRI ixa ) (Xe Ra Rj}X1) 


are independently distributed, each according to a central multivariate 
t distribution. This result does not remain true to the noncentral distri- 
bution (5.1). Actually, Siotani (1976) showed that if 


oie ae ) 
=. 
is the partition of € corresponding to that of X, 
E21 =é — RaR é 


and 


and 
R221 = R22 — Ra Ri Rio, 


then the joint pdf of Zı and Ze is 


1 _ 1 =, 
f(z1,22) = Kexp (-5ePRie, = Rala) 
xty (Z1; Rai, pi) typ; (22; R221; P — pi); 


where the last, two terms denote the pdfs of central multivariate t distri- 
butions with appropriate parameters and K is given by the formidable 


expression 
- Sy Peters DP) (2) (a) 
k 


zi Ri TE 


yl+2t Rī} zı /v 


5.1 Kshirsagar’s Noncentral Multivariate t Distribution 89 


x% (27 R21) 
{1 +27 R3212/(v + pi) 


Siotani (1976) also derived the corresponding noncentral distribution 
when X is partitioned into k sets of variates as in (1.20). Following the 
notations defined by (1.21), (1.22), (1.23), (1.24), and (1.25), the joint 
pdf of 


(k+1) /2 
} 


and 


1/2 
v+q ee p- i 
Z = -XAR `X 
141 \/ 7 (: + yin bir) o) 


Hi)T 
x (Xe as Ri)” RUX«) , 


is given by the lengthy expression 


k -1/2 

1 ((v + q)/2) [Ru-a-1)| 
P@ns.2) = ep | == Soe?) Fp ae 
say] loon nT ((v + qi-1)/2) 


k a Ro} z ~(v+q1)/2 
AL Mu-(u—1) 4 
1 
SGE s V + qı reer) 


A T((v+p+h +- aa )/2) 


l1 =0 1,=0 
k Lm /2 
2 
x SS 
Career 
lm 
T 1 
£ (z mR mm. (m-1 Baen) 
x Loane F 
mai (1+ 27, R, moni J2m ) 
where 
Z (m) - 
Em- (m~1) S Eni = Rim- 1) Re. 1 )§(m—1) 
and 


2 _ JT: —1 
Om = Em. (m-1)Rmm.(m-1)Êm-(m-1) : 


90 Multivariate Generalizations and Related Distributions 


5.2 Miller’s Noncentral Multivariate t Distribution 


Let Y have the p-variate normal distribution with mean vector yz and 
correlation matrix R > 0. Let S be distributed independently of Y 
according to a v-variate normal distribution with mean vector À and 
correlation matrix mI,. Miller (1968) considered the joint distribution 


of 
y Y Y, 
XP = (Xi, Xa Xp) = ( > 2 e), 


which he referred to as the generalized p-dimensional t random vector. 
Assuming | S |? has the chi-squared distribution, Miller showed that the 
joint pdf of X is given by 


—(v+p)/2 


91-(v+p)/2m 7T v+ 1 
fix) = | 


YE —+ <7R'x) 
T(v/2)r?/2 |R] 


m 
2 
Te ty m (xT R! y) 
xep] pe ihe ree 
JVmx? Ro u 
necp aT : (5:2) 


where D_(,+p)(-) is the parabolic cylinder function (see, for example, 
Erdélyi et al., 1953). If æ = 0, then (5.2) reduces to 


= m ”PT(( + p)/2) 1 Tp-l 
58) =) rae (= +xTRox 


AP v+p vy, IAP 
on {AE 2 ° 2’ 2m(mxTR-'x +1) |’ 


where ;F is the confluent hypergeometric function (see, for example, 
Erdélyi et al., 1953). If both u = 0 and A = 0, then (5.2) reduces 
to the usual central multivariate t distribution (1.1) with degrees of 
freedom v and correlation matrix R. To the best of our knowledge, this 
interesting distribution given by (5.2) has not been pursued further since 
its introduction some 35 years ago. 


ee 


5.3 Stepwise Multivariate t Distribution 


Let Y be a p-variate normal random vector with mean vector fz, common 
variance g?, and correlation matrix R > 0. Let vS?/o? be a chi-squared 


5.3 Stepwise Multivariate t Distribution 91 


random variable with degrees of freedom v, distributed independently 
of Y. Then the joint distribution of 


and 


xy, = v +k- 1 Yr — thy Re 1 ¥ (4-1) 
E = Yva- S 


1 = 
Nom YZ- Re Y(k—-1): 


k= 2,...,p, (5.3) 


where 7, denotes the multiple correlation coefficient between Yp and 
(Yi, tes ,Yk-1), 


Yi = (Yi, Yo,..-, Yk), 
1 r Tkk 
T31 1 T2k 
Ra = 
Tki Tk 1 
and 
T = 
rk = (Tibet, T2,k+1> +++ Tk,k+1) 


is known as the stepwise multivariate t distribution. This distribu- 
tion has applications in linear multiple regression analysis; for instance, 
suppose that a random sample Y,,...,¥,;, corresponding to some non- 
random values (21;, Z2:), i = 1,...,n, is available. The null hypothesis 
to be tested is that the slopes of the two simple regression lines, Y on 
zı and Y on 22, are both zero. Then, the X, and X> above could corre- 
spond to the usual ¢ statistics for testing the two regression coefficients 
(Steffens, 1974). 

Steffens (1969a) studied the distribution of (5.3) for the special case 
R = Iņ, the p x p identity matrix. In this case, since 7, = 0, r(,-1) = 0 
and Ryx—1) = Tn-1), (5-3) reduces to 


2a 
= yF y1 se YR Yo-y 


92 Multivariate Generalizations and Related Distributions 


vv+k-1¥, 


[vS +X? +-+ XP, 


If 2 = 0, Steffens (1969a) showed that X1, X2,..., Xp are independent 
Student’s t random variables with degrees of freedom v,v+1,...,v-+p—1, 
respectively. This result also holds for general R (Siotani, 1976, Corol- 
lary 3.1). In the noncentral case u # 0, the X;’s are still independent, 
but X, has the noncentral distribution with degrees of freedom v and 
noncentrality parameter y/o while the X;’s (j = 2,3,...,p) have the 
doubly noncentral ¢ distributions with degrees of freedom v + j — 1 and 
noncentrality parameter y;/o in the numerator and (p/o)? + (2/0)? + 

+ (uj-1/0)? in the denominator. Steffens derived the joint pdf of the 
X;’s in the bivariate case as the double infinite series 


= exp { (57 + 632) ) 7/2} S (V261)' m 
f (21,22) = SC ae py1/(2l) +1) 09 


PE E. of : (v+k+1)/2 
(Stet gti eg (its) 


z} ( z2 oo 


? 


where 6; = u;/o, j = 1,2. For general p and R, Siotani (1976) showed 
that if ~ = 0, then X,, X2,..., Xp are still independent Student’s t ran- 
dom variables with degrees of freedom v,v+1,...,v+p—1, respectively. 
In the noncentral case, the joint pdf of the X; in (5.3) generalizes to 


_ E 
f (E12) = a (- 5 et) ee 


p a (v+k)/2 
«TI {1+ ea} 


i=l 
n T ((v + ky +--+ +kp)/2) 
An = P(O F/D! 


Pp 2 kı/2 
Uai 


` 
5.4 Siotani’s Noncentral Multivariate t Distribution 93 


P ki 


(x17) 


x pean | a, 
t=1 {1 + z? /(v + j= ih Gases 


where the 7;’s are the noncentrality parameters given by 


GT RG $1) 
Tt = — m, 


En) = (61, €2,---.&), 


and ĉj = uj /0. 


5.4 Siotani’s Noncentral Multivariate t Distribution 


In Section 4.5, we discussed a bivariate generalization of the doubly 
noncentral univariate ¢ distribution given by (4.9). Siotani (1976) pro- 
vided a multivariate generalization of (4.9) by observing that the pdf of 
S* = S/Vv (where S is a noncentra] chi-squared random variable with 
degrees of freedom v and noncentrality parameter à) can be expressed 
as the Poisson mixture 


f(s*) = >> PAoa ("), 
k=0 
where 


exp(—A)A* 
k! 


is the kth probability of the Poisson distribution with parameter À and 


P(A) = 


è 2y” +2k)/2 *\v+2k—1 l a2 
faat) = DA (y+ e E ap (=z ) 


He defined X = Y/S* to have the doubly noncentral multivariate t 
distribution, where Y is a multivariate normal random vector with mean 
vector #2, unit variances, and correlation matrix R. The joint pdf of X 
is easily obtained as a Poisson mixture of the noncentral pdf (5.1) with 
v + 2k in place of v in the arguments of gamma functions and in the 
power of 1+x?7R7!x/v, that is, 


f(x) = SURO) fy): (5.4) 


k=0 


0 
94 Multivariate Generalizations and Related Distributions 


where 
: p Teine l o VERR) a 
buo = exp {- z£ rRe) (vr)PT ((v + 2k)/2) [R 
—(v+2k+p)/2 
fisi R Rx] 
ase (v+2k+p+0/2) f værre | 
AD (y+ 2k + p)/2) | Vu +x Rx] 


l=0 


5.5 Arellano-Valle and Bolfarine’s Generalized t Distribution 


Arellano-Valle and Bolfarine (1995) considered what is being referred to 
as a generalized ¢ distribution within the class of elliptical distributions. 
The distribution is defined by 


X = ptV}/?y, (5.5) 


where V has the inverse gamma distribution given by the pdf 


fv) = oer ee exp (->) , v>0 


and Y is distributed independently of V according to a p-dimensional 
normal distribution with mean vector 0 and covariance matrix R. We 
shall write X ~ tp(u, R; à, v). When A = v, this distribution reduces to 
the usual multivariate ¢ distribution (1.1) with mean vector p, correla- 
tion matrix R, and degrees of freedom v. For R > 0, the joint pdf of 
X ~ tp(p, R; A, v) is 


T (v +p)/2) 1 Tp- poten 
f(x) = — 55 J 4+ (xu) R (x-p 
9 = TTR Lt ! 
(5.6) 
d 
It is easy to observe from (5.5) that 
E(X) = gp, y>1 (5.7) 
and 
Vor(X) = SR >2 (5.8) 
ar = 773R YS, : 


Furthermore, for an m x 1 vector 7 and an m x p matrix B, 


Z = n+BX 


5.5 Arellano- Valle and Bolfarine’s Generalized t Distribution 95 
= (n+By)+V'”BY 
~ tm (n+ Bu, BRB’; },v) (5.9) 


since BY has the m-dimensional normal distribution with mean vector 
0 and correlation matrix BRB’. Now let 


a Xi 
x = E i (5.10) 

Hı 
2 5.11 
d ( H2 ) ( ) 

and 
Rii Sa 

R = l 5.12 
e Ry» rae 


where X; ism x 1, Ri is m x m and so on. Taking B = [Im, 0] in (5.9), 
note that Xı ~ tm(u1, R11; à, v). By symmetry, Xə ~ tp-m(Hz, R22; 
à, v). Assuming R > 0, let 


Hy (x2) = m+ RoR (x2 — Hy), 
Rie = Rn- RoR Ra, 

and 

q(x2) = (x2- m)” Roy (x2 — M2). 
Using the fact that 

IR] = [R112] |R22| 
and 
(xp) RO (x-y) = (1 — Hy)” Rira (x2 ~ p (2) + q (x2), 

note that the conditional pdf of X, given X2 = x2 is given by 


|x) = HEDDA + ala), 
f(xi|x2) = n™/2T ((v + p — m)/2) (Ri? A +q (x2) 


—(v+p)/2 
+ (x1 = M (x2))7 Rii (X1 — py | 


This means that 


KX, |X2=x2 ~ tm (M, (x2), R112; À +q (x2), v +p- m). 
(5.13) 


96 Multivariate Generalizations and Related Distributions 


Note that when A = v, Xj | X2 = x2 ~ tm(H (x2), R112; V +q (X2), V+ 
p-m). Since q(x2) # p — m, this shows that the usual ¢ distribution 
does not retain its conditional distributions (see Section 1.11). Finally, 
it follows from (5.13) and (5.7)-(5.8) that 


E(X,|X2) = m +RuRy (X2 - p) 
and 


A + (X2 ~ ho)” Ry (X2 — Hy) 
v+p-m-2 
x (Ru = RiR Rai) . 


Cov (X: | X2) 


Arellano-Valle and Bolfarine (1995) also presented characterizations 
of the generalized t distribution (5.5) in terms of marginal distributions, 
conditional distributions, quadratic forms, and within the class of com- 
pound normal distributions. Briefly, these characterizations are 


e Let X have the p-variate elliptically symmetric distribution with mean 
vector 4 and covariance matrix R (for a definition of an elliptically 
symmetric distribution see, for example, Fang et al., 1990). Then, any 
marginal distribution is a generalized ¢ distribution if and only if X 
has a generalized ¢ distribution. 
Let X = (X7, XZ)? have the p-variate elliptically symmetric distri- 
bution with mean vector ft and covariance matrix R, where X; is 
m x 1. Then, the conditional distribution of X; given X3 is the gen- 
eralized m-variate ¢ distribution if and only if the distribution of X is 
the generalized p-variate t distribution. The proof of this result, which 
assumes the existence of a density, is similar to the proof considered 
in the pioneering paper by Kelker (1970) for the characterization of 
the multivariate normal distribution. 

e Let X have the p-variate elliptically symmetric distribution with mean 
vector 0 and covariance matrix I,, and let A be asymmetric p x p ma- 
trix. Then, XTAX ~ (mX/v) Fm» if and only if X ~ tp(0, Ip; à, v), 
A? = A, and rank(A) = m. This result is proved by utilizing An- 
derson and Fang’s (1987) assertion on the spherical distributions that 
put zero mass in the origin. 


The fourth characterization within the class of compound normal distri- 
butions is a consequence of a well known theorem due to Diaconis and 
Yivisaker (1979), which asserts that, in the regular exponential family 
with the natural parameterization, if the posterior expectation is lin- 
ear, then the prior distribution must be conjugated. It states that if 


5.6 Fang et al.’s Asymmetric Multivariate t Distribution 97 


X,,Xo,... is an infinite sequence of orthogonally invariant random vari- 
ables (which means that for each p, X = (X1,..., Xp)? and TX are 
identically distributed, for all p x p orthogonal matrices T) such that 
X, = 0 with probability zero and 


Var(X2|X1) = b+aX?, O<a<i1, b>0, (5.14) 


then X is distributed as ¢,(0,1,;b/a, (a+1)/a). The converse also holds. 
Arellano-Valle et al. (1994) pointed out that (5.14) could be extended 
to yield a location mixture of generalized ¢ distributions as follows. Let 
X,,X2,... be an infinite sequence random variables such that for each 
p, X = (Xi,...,Xp)? and PX are identically distributed, for all p x p 
orthogonal matrices T satisfying T1, = 1, (where 1, is a p-dimensional 
vector of 1’s). Under this assumption there exists random variables M 
and V such that, conditional on M and V, X1, X2,... are independent 
and normally distributed with mean M and variance V. Actually, M 
and V can be interpreted as the limits 


n 
Zas 0X > M 
i=l 


and 


as n — œ, where the convergence is with probability 1. Furthermore, if 
Var { (X - M? | X, M} = a(Xı- M}? +5, 

0<a<1, b>0, (5.15) 

then X is a location mixture of tp(M 1p, Ip; b/a, (a + 1)/a) and, in addi- 

tion, M and V are independent. Because of the form of the conditions 

(5.14)-(5.15), these two results are known as the predictivistic char- 


acterizations of the generalized t distribution. These results could be 
extended further to the matrix-variate t distributions (see Section 5.11). 


5.6 Fang et al.’s Asymmetric Multivariate t Distribution 


Fang et al. (2002) introduced an asymmetric p-variate t distribution with 
degrees of freedom (m,m,..., Mp). Its joint pdf is given by 


Pp 


fo: SOG Calan Ty Geen) aa) 


i=l 


98 Multivariate Generalizations and Related Distributions 


(5.16) 
where 
apy _ Elm +p)/2)T°-! (m/2) yR y E 
(Yn Yp; R) = T? ((m + 1)/2) |R]? (1+ m ) 


o 


p 

Yi 
x te iat 
(+ 


Here, R. denotes the correlation matrix, and tm and Tm, respectively, 
denote the pdf and the cdf of the Student’s ¢ distribution with degrees 
of freedom m. Note that the marginals of (5.16) have different degrees 
of freedom. In the particular case m; = m, i = 1,...,p, (5.16) reduces 
to the usual p-variate ¢ distribution with degrees of freedom m. 


5.7 Gupta’s Skewed Multivariate t Distribution 


In the next four sections (starting with this section), we shall discuss 
skewed multivariate ¢ distributions — a topic that has received special 
attention in the last few years, following the introduction of the skewed 
multivariate normal distribution in the classical paper by Azzalini and 
Dalla Valle (1996). A careful reader will observe that the possibilities of 
constructing skewed multivariate ¢ distributions are practically limitless. 

A p-variate random vector Y = (Y1, ¥2,...,¥p)? is said to have the 
skewed normal distribution if its joint pdf is given by 


fy(y) = 2dp(y;Z)@(aty), yew, (5.17) 


where X > 0 (with R denoting the corresponding correlation matrix), 
a € RP, bp(y; X) is the p-dimensional normal density with zero means 
and covariance matrix X, and ®(-) is the cdf of the standard normal 
distribution. Let W be a chi-squared random variable with degrees of 
freedom vy, distributed independently of Y. Gupta (2000) defined the 
joint distribution of 


si j= 1,2,...,p 5.18 
We j (5.18) 


as the skewed multivariate t distribution with degrees of freedom v. The 
joint pdf of (5.18) is given by 


T. 
fsa) = 2f.(x)Fosp (ve). (5.19) 


5.7 Gupta’s Skewed Multivariate t Distribution 99 


(a) (b) 
(c) (d) 


-4 


Fig. 5.1. Fang et al.’s asymmetric t pdf (5.16) in the bivariate case (a) m = 2, 
mı = 10, mz = 10, and rig = 0; (b) m = 2, mı = 10, m2 = 2, and r12 = 0; (c) 
m = 2, mı = 10, m2 = 10, and ri2 = 0.5; and (d) m = 2, mı = 10, m2 = 10, 
and rı2 = 0.9 


where x € RP. Here, fy and Fẹ, respectively, denote the joint pdf of the 
central p-variate t distribution with correlation matrix R and degrees 
of freedom k and the cdf of the Student’s ¢ distribution with degrees of 
freedom k. From the definition (5.18) and the joint pdf (5.19), Gupta 
noted the following properties 


e If œa = 0, then (5.19) reduces to the central p-variate £ distribution 
with correlation matrix R and degrees of freedom v. 


e The skewed multivariate ¢ distribution approaches the skewed multi- 
variate normal distribution as v > oo, that is, 


Jim fix (x; a) = 2p (x; E)? (ax). 


100 Multivariate Generalizations and Related Distributions 
e Since Y? is a chi-squared random variable with degree of freedom 1, 


2 


2 Y 
tS e 


(1, v), 


the F distribution with degrees of freedom 1 and v; furthermore, the 
joint distribution of (X?, X3,...,X?) is multivariate F with parame- 
ters 1,1,...,1,v +p. 

e Since y7D"y is a chi-squared random variable with degrees of free- 
dom p, the quadratic form 


y= 'y 
W/v 


xD ix = 


~ pF pv; 


note that — as in the case for multivariate normal — the distribution 
of this quadratic form does not depend on a. 


The special case of (5.19) for & = I, is called the standard skewed 
multivariate ¢ distribution. If, in addition, v = 1, then it is defined as 
the skewed multivariate Cauchy distribution with the joint pdf 


fx(xja) = Qn PDP e) (5a) 


0 
T 
x Fp} E) , x ER. 


It should be noted that the above does not belong to the class of ellip- 
tically symmetric distributions, whereas the multivariate Cauchy does. 

Using results in Azzalini and Dalla Valle (1996) and Gupta and Kollo 
(2000), the mean vector and the covariance matrix associated with (5.19) 
are calculated as 


—(p+1)/2 


2v La 
HRS Er 


y > 2, 


Cov (X) v | 2(v + 4jaaT E 


Taea Pav —2)(1+ aFZa)|’ 
v >A, 


5.7 Gupta’s Skewed Multivariate t Distribution 101 


respectively. Furthermore, using the definition (5.18), the product mo- 
ments are easily obtained as 


P 
Phe tajat = E Ú t 
j=1 


vB (W ve(H “| 
vv ( wh) =r) Tj 
2 T(v/2) e (fr j 


for r < v/2, where r = Ti +T2 +++: +Tp. If Yi, Y2,...,Yp are mutually 
independent, then the right-hand side can be easily calculated. 

Branco and Dey (2001) noted that the joint pdf (5.19) is a particular 
case of a general class of skewed multivariate elliptical distributions. 
Actually, the joint pdf of the general class takes the form 


f) = fuse (x) Free (A7 Œœ- y), (5.20) 


where v* =v + p, 


T” = r+(x- u)" R(x- p), 
d= al R-} 
~ Vi-aTR-1a’ 
r(Y +p)/2) won RO] 
Bao Ee A E RRT 
fer) TPPA RF bs i | : 
and 
+\y"/ * x 7 
Fp (2) = (rt) PT + 1)/2) (rt +42)” HY? 


varT (v*/2) -00 


Note that f,,-(x) is the generalized ¢ pdf described in equation (5.6), 
and that F,-,,-+(x) is the cdf of a generalized version of the Student’s t 
distribution. The mean and the variance of the univariate marginals of 


(5.20) are 
_ af ((y—1)/2) fv 
i T (v/2) E 


102 Multivariate Generalizations and Related Distributions 
(provided v > 1) and 
yay [((v—1)/2)]? 
Var OO =. = Sa 
ne) a r T (v/2) | 


(provided v > 2), respectively. 


5.8 Sahu et al.’s Skewed Multivariate t Distribution 


Using transformation and conditioning, Sahu et al. (2000) obtained a 
skewed multivariate ¢ distribution given by the joint pdf 


y+p 
f(x) = tm, (x; p, R + D?) Tmvim |4/—~ 
( ) Trt v+q(y) 


x (1 -~D(R+D?)* pb)” D (R +D?) | 
(5.21) 


where y = x ~ p, q(y) = yT (R + D’)'y, and D is a diagonal matrix 
with the skewness parameters 41, ..., dm. In (5.21), tm (s,Q) denotes 
the usual m-variate ¢ density with mean vector p, correlation matrix Q, 
and degrees of freedom v. Furthermore, Tm,y4m(-) denotes the joint cdf 
of tm (0,1). The mean and the variance of this skewed ¢ distribution 
are given by 


vT ((v — 1)/2) 


E(X) = p+ 1 TOR 
and 
v v (T((v—1)/2)\" 
om = men (ey 


(provided v > 2), respectively, where ô = (51,...,5m)7. The multivari- 
ate skewness measure /1,m (Mardia, 1970b) can be calculated in analytic 
form. The expression does not simplify and involves nonlinear interac- 
tions between the degrees of freedom (v) and the skewness parameter 
ô when D = ôI. However, i,m approaches +1 as ô — too. Sahu et 
al. (2000) discussed an application of this model in Bayesian regression 
models. 


5.9 Azzalini and Capitanio’s Skewed Multivariate t Distribution103 


5.9 Azzalini and Capitanio’s Skewed Multivariate t 
Distribution 


A slight extension of the skewed normal distribution given in (5.17) is 
fx Y) = p(y- E; E) (aW (y-&)), (5.22) 
where y € RP, € € RP, W = diag(,/oi,..-,,/Opp), and the rest is as 
defined in (5.17). In the particular case € = 0, (5.22) reduces to (5.17). 
Starting with a random vector Y having the pdf (5.22) with € = 0, 


Azzalini and Capitanio (2002) defined a skewed t variate as the scale 
mixture 


X = €4+Y/VV, (5.23) 


where vV is distributed independently of Y according to a chi-squared 
distribution with degrees of freedom v. Simple calculations using a pre- 
liminary result on Gamma variates show that the joint pdf of X is 


KO = 24s (0) (arw (x8) tE), 


where Q = (x — €)’R71(x — €) and fk, Fẹ are as defined in (5.19). 
Note that this pdf coincides with that of Branco and Dey (2001) given 
in (5.20). In the standard case € = 0 and © = R, the joint cdf of X can 
be represented as 


Fx(x) = 2Pr (—Uo/VV <0,U/WV < x) ; (5.24) 


where (Uo, UT)T has the (p + 1)-dimensional normal distribution with 
zero means and covariance matrix given by 


i 1 T 
mo (aR) 


Ra 
vVi+aTRa 
The representation (5.24) can also be written in terms of a (p + 1)- 


dimensional ¢ distribution. 
In the case = 0, simple expressions for the moments of X can be 


obtained. Defining 
de r(e- 
n T(v/2) : 


where 


104 Multivariate Generalizations and Related Distributions 
and provided that v > 1, one obtains 
E(Xk) = Wkkhk, 


Y 
E(X) = 5 Vki 


3v? 


E(Xk) = ug gie 


E(X) = Wau, 


E(XXT) = 


v{3- Tô 
Skewness (X4) = pk pE — 2 + wrn) 


and 


3p? Avy! p (3 ~ 575) 
v= 2)v=—4.— v—3 


2 
6v ye" u Ts\? a > 
UPET -3 (ô ô) y—2 KH 


v >4. 


Kurtosis (X) = i 


Properties concerning linear and quadratic forms of X can also be de- 
rived. For example, if a € R” and A is a m x p constant matrix of rank 
m, then the affine transformation a + AX will also follow the skewed t 
distribution given by (5.23) with the parameters €, =, and @ replaced 
by a+ A€, X', and a’, respectively (the degrees of freedom v remains 
unchanged), where 


D = ASAT, 


5.10 Jones’ Skewed Multivariate t Distribution 105 
W’ (=')' Ba 
oO = mm mmm 

+a? (2" -B (2') ° BT) a 


W' = VS’, B = W-'DAT, and Ð” is given by © = WE"W. Also, 
for appropriate choices of B (a symmetric p x p matrix), the quadratic 
form Q = (X-—£)"B(X — £) can be shown to have the f Fẹ» distribution 
for some degrees of freedom f. For details see Azzalini and Capitanio 
(1999) and and Capitanio et al. (2002). 

A further extension of (5.17) examined independently by Arnold and 
Beaver (2000) and Capitanio et al. (2002) is of the form 


f(y) = p(y- £; E) 8 (ao +a7W" (y - €)) / (7), (5.25) 


where y € RP, T E R, ao =T/V1-— 6’ R-'6, and the rest are as defined 
in (5.22). In the particular case 7 = 0, (5.25) reduces to (5.22). Taking 
Y in (5.23) to have the pdf (5.25) with € = 0, one obtains an extended 
skewed t distribution for X. The corresponding joint pdf for X is quite 
complicated, but the joint cdf can be represented as 


F(x) = Pr (- (Uo +17) /VV < 0,U/VV < x) / (7) 


(compare with (5.24)). Moreover, for the particular case € = 0, the first- 
and second-order moments are 


E(X) = E (1/vV) m(r)W6 


(provided v > 1) and 
E(XXT) = -= È + {n2(7) +n? (r)} Wô (ws?) '] 


(provided v > 2), respectively, where 
dE 


for k = 0,1,2,.... 


5.10 Jones’ Skewed Multivariate t Distribution 
The univariate Student’s ¢ distribution has the pdf 


Pv +1)/2) f P jeer | 


ATED 2 (5.26) 


Y 


106 Multivariate Generalizations and Related Distributions 


By replacing (5.26) with the skewed univariate ¢ pdf (4.38) in a multi- 
variate distribution, Jones (2002c) introduced a new skewed multivariate 
t distribution that we shall describe in this section. Let X be a p-variate 
random vector having the standard multivariate ¢ distribution with the 
joint pdf given by 


Mera f xtxy rr. 
Jur (v/2) v ` 
The univariate marginals of this are (5.26). Multiplying (5.27) by (4.38) 


and dividing by (5.26) yields Jones’ (2002c) skewed multivariate ¢ dis- 
tribution. The corresponding joint pdf is 


(5.27) 


21-a—T((v + p)/2)T (a +0) 
wreda F ((v + 1)/2 aio) 


+1/2 

z? (v+1)/2 qi a+1/ 
x414 a 
v Ja+c+2? 


EES Ty) —(tp)/2 
Ce ae eek eRe {i+ =} . (5.28) 
Jatet+a v 


This reduces to (5.27) for a = c = v/2. In the bivariate case, (5.28) 
is a distribution with (i) a skewed t marginal with parameters a and c 
in the gı direction; (ii) conditional distributions of Xə |X, that match 
those of the bivariate ¢ distribution being t distributions on v + 1 de- 
grees of freedom scaled by a factor of y (v +2?)/(v +1); and (iii) a 
diagonal correlation matrix. Another new multivariate distribution can 
be obtained by replacing (5.26) by the pdf of the Gumbel distribution: 
exp(—2a — exp(—2)). This results in the joint pdf 


P((v +p)/2) 
ET PM} 
ie ~(v+p)/2 
«(14 3) {1+} (6.29) 


With respect to the correlation structure, this pdf has much in common 
with (5.28). But the conditional distribution of X, given X2,...,Xp 
and the marginals are different. 

Jones (2001a) noted that, if Y has the beta distribution with param- 
eters a and c, then X = Vat+cY/V1 — Y? has the skewed univariate t 
distribution given in (4.38). Jones (2002c) observed a similar relation- 


5.10 Jones’ Skewed Multivariate t Distribution 107 


(a) (b) 


x2 


Fig. 5.2. Jones’ skewed multivariate t pdf (5.28) for p = 2 and (a) a = 6, 
v = 3, and c = 2; and (b) a = 2, v = 3, andc=6 


ship between the joint beta pdf 


21—0=cP (a + c)I'(b) a-t ai T 
Tar tyre tt a a a E 


a>0, b>1/2, c>0 


and the skewed multivariate t distribution given in (5.28) when p = 2 
and b = v/2 + 1; namely, if (Y; , Y2) have the former distribution, then 


(XXa) = ViiVJate VYevv +Yř(a+c-v) 
Hen T -Y y1- Y-Y; 
has the distribution (5.28) for p = 2. 


In the univariate case, F' and skewed t (equation (4.38)) distributions 
are linked in two ways that produce identical results: (i) A random 


108 Multivariate Generalizations and Related Distributions 


(a) (b) 


x2 


Fig. 5.3. Jones’ skewed multivariate t pdf (5.29) for p = 2 and (a) v = 1; and 
(b) v = 20 


variable with any one distribution can be obtained by transforming a 
random variable from the other; (ii) a random variable with each dis- 
tribution can be written as a function of two independent chi-squared 
random variables. If W; ~ xb; , Fi ~ Foy, vo» and T; is a random variable 
with the pdf (4.38), then 


T; (JAVR VF; - 7 2z) (5.30) 
F; = aa (n+ yur), 


A e) 


5.10 Jones’ Skewed Multivariate t Distribution 109 


and 


where w; = v + vi. By extending this relationship between the univari- 
ate F and the skewed univariate t, Jones (2001b) introduced another 
skewed multivariate ¢ distribution. It is known (see, for example, John- 
son and Kotz, 1972, Chapter 40, and Hutchinson and Lai, 1990, Section 
6.3) that the joint pdf of the random variables F;, i =1,...,p is 


f(fis---sfp) = D (vo) T (vp) A+ aus). 
fı>0,...,fp>0, (5.31) 


where n = v9 +--- + vp. Applying the transformation (5.30) to (5.31), 
Jones (2001b) obtained the joint pdf of T; as 


_ ty fe (e+ vinta)” 
Perot) = Puro) atv re 


(ayara) | 
P (ty +4/wj,+t? 
y. S e 
j=1 na 
ti ER,...,tp ER. 5.32 
p 


The univariate marginals of this pdf take the form of (4.38). The con- 
ditional pdf of T; given any subset T;,,..., Tip, of the other variables, 
p2 < p, is proportional to 


2v; 
(t: + fwit t) 


Vati {1+ K- (4+ Vora) } 


wi tY te tVing ? 


where 


p2 1 2 
K = w 14A (tye) i 
f=1 


The regression of T; given T;,,... Tipa takes the nonlinear form 


y= Re ee SM te Ue voR 


E (T; \ Tas- T, NOA N 


tpz 


110 Multivariate Generalizations and Related Distributions 


yowi (pb — 1/2) 
VbK 


where = vo + vi +e + Vip,» Note that the corresponding relation 
for the multivariate F' distribution in (5.31) is linear. If T),...,Tim 
denote any m of the p T;’s (with their degrees of freedom correspondingly 


renumbered as v1,..., Vm along with vo), then the product moment of 


Phera Tm is 
m wal? Am [2 P 
À; "e Wm Ài 
hii = Fis i i 
(fe) = pete Sie (Bae) 


Eeit) 


xT a — Dii itn) , 


provided that v; > A;/2, i = 1,...,m and vo > (Ai +: +Am)/2. In 
particular, the variances and the covariances are given by 


aa alg a {erica 


E (re “1/2)Cr (vi — ua 
I (vo) T (v:) 
gpu 2 
(vo — 1) (vi — 1) 


(provided vo > 1 and v; > 1) and 


Cov (T;, T;) = een eee 


(vi — 1/2) (v; — 1/2) 
x (vo — 1/5 


(provided v; > 1/2, v; > 1/2 and vo > 1), respectively. In the particular 


5.10 Jones’ Skewed Multivariate t Distribution 111 


case vo = ++: = Vp = v/2, (5.32) reduces to 


P((p + 1)v/2) B f (tet FR) 


Font) = 3 a I aM 


Tet! (v/2) rae Jv +t 
r (p+1)v/2 
J y+ >> (t+ yrë) , 
j=1 
t E R,... tp ER. (5.33) 


Jones (2001b) referred to this distribution as the symmetric multivariate 
t distribution. Note that all of the marginals of (5.33) have the Student’s 
t distribution with degrees of freedom v. The correlation between any 
two T;’s in (5.33) takes the simple form 


ae (RA) 


provided that v > 2. 
The limiting form of (5.32) as vo — oo and v; > 1 remains fixed, 
i=1,...,pcan be shown to be 


p 
(11 n g (41 + out, sey pp + Opty) ’ 
i=] 


where 
p 72+) 
= p i 
g (ti, sth) 2 I T (44) exp ( z) , 
= r (ri = 1/2) 
Hi T (vi) , 
and 


1 12(4%—1/2) 
n-—1 r? (n) ` 


Note that z; and g; are the mean and the standard deviation of ,/2/x3,, 
distribution. When vp remains fixed but v1,..., Vp — oo, the marginals 
of (5.32) tend to ,/2/x3,,, distribution, but the correlations between the 
T;,’s tend to 1 and the joint distribution becomes degenerate. When all 


Vo, V1, ...,Vp — 00, all of the marginals tend to the normal distribution 
— but the form of the limiting joint distribution will depend on the 


112 Multivariate Generalizations and Related Distributions 


(a) (b) 


Fig. 5.4. Jones’ skewed multivariate t pdf (5.32) for p = 2 and (a) vo = 2, 
vı = 4, and v2 = 4; (b) vo = 2, m = 20, and v2 = 1; (c) vo = 2, vı = 1, and 
v = 20; and (d) v =m =n =2 


specific relationships between the v’s. The limit of (5.33) as v —> oo 
is the multivariate normal distribution with zero means, unit variances, 
and an intraclass correlation structure with correlation 1/2. 


5.11 Matrix-Variate t Distribution 


The matrix-variate ¢ distribution, motivated by applications in Bayesian 
inference, is the product of James Dickey’s research in the mid-1960s. We 
need the following terminology to discuss its mathematical properties. 
Let yz be a p x q constant matrix, let R > 0 be a p x p matrix, and let 


5.11 Matriz-Variate t Distribution 113 
Q > 0 be ag xq matrix. For m > p +q -— 1 define 


k(m, p, q) = TT RD, (5.34) 


where 


1 r 1 
— "0-1/4 -1\...r[z-2412 
r,(z) T (2) (« 3) r (« 5 + z) 


is the generalized gamma function. Furthermore, for real or complex 
constants a1,...,@p and b,,...,6, and for random matrices S and T, 
define the general hypergeometric functions (see Constantine, 1963) 


pfa (@i,.. . Qp; b1,- -ba S) 
(a ee = 
“>>> Gyo ae — oP) 


k=0 K 


and 
pk, (a1, ..., Ops bi,- -- ba; S, T) 


7 D k! , (5.36) 


where K = {k1,...,km}, ki > k2 > +++ > km > 0, ki thot---thkm = k, 
Tm (z, &) 


Pla) 


(z)x 


T, (z,k) = rnat) (2 the- 3) r (ath -2), 


and C, (S) and C,,(T) are symmetric homogeneous polynomials of degree 
k in the latent roots of S and T, respectively. 

A p xq random matrix X is said to have the matrix-variate t distri- 
bution with parameters u, R, Q, and m if its joint pdf is 


1 (m—p)/2 |p )-4/2 
xX = P R q 
IX = yzg QR 
—m/2 


x |Q +(X - pw)" R (X— p) (5.37) 
(Dickey, 1966a, 1967b). If p = 0, then we say that X has the central 
matrix-variate t distribution with parameters R, Q, and m. Otherwise, 


we refer to the distribution as a noncentral matrix-variate t. The usual 
multivariate ¢ distribution (1.1) is the special case of (5.37) for p = 1 


114 Multivariate Generalizations and Related Distributions 


(single row) or q = 1 (single column). It is also known that the partic- 
ular case of (5.37) for p = 0 and R = I, is a mixture of the normal 
density with zero means and covariance matrix I, ® V — in the q x q 
positive definite scale matrix V. Densities of the form (5.37) appear in 
the frequentist approach to normal regression as the distribution of the 
Studentized error, both the error in the least squares estimate of the 
coefficients matrix and the error in the corresponding predictor of a fu- 
ture data array (Cornish, 1954; Kshirsagar, 1961; Kiefer and Schwartz, 
1965). In the Bayesian conjugate-prior and diffuse-prior analyses for the 
same sampling models, it arises as the marginal prior or posterior dis- 
tribution of the unknown coefficients matrix, and also as the predictive 
distribution of a future data array (Geisser and Cornfield, 1963; Ando 
and Kaufman, 1965; Geisser, 1965; Dickey, 1967b, Section 4; Zellner, 
1971, Chapter 8; Press, 1972, Section 8.6). More recently, Van Dijk 
(1985, 1986) discussed applications of (5.37) in the linear simultane- 
ous equation (SEM) model, which is one of the best-known models in 
econometrics. The SEM model is used in several areas, for instance, in 
microeconomic modeling for the description of the operation of a mar- 
ket for a particular economic commodity and in macroeconomic model- 
ing for the description of the interrelations between a large number of 
macroeconomic variables. 

If X has the central matrix-variate distribution with parameters R, 
Q and m, then it can be represented in numerous ways, as described by 
Dickey (1967b) and Dawid (1981). The following results (due to Dickey, 
1967b, and Dickey et al., 1986) concern the conditional and the marginal 
distributions of X 


e If X = (X,,X2)", then the conditional distribution of X,, given 
Xo, is the matrix-variate t with parameters —R1,;Rjy X2, RI}, Q + 
XTR Xo, and m. 

e If X = (X,, Xe), then the conditional distribution of X1, given Xo, is 
a matrix-variate t with parameters X2Q3) Q21, (R + X2Q3p X7)7}, 
Qu - QQZ Qz, and m. 

e If X = (Xi, X2)T, where X; is p; xq, then the marginal distribution of 
X, is a central matrix-variate t with parameters Ro Q and m — pı. 
In the particular case XT = (x1,...,Xp), each row x7 has the central 
multivariate t distribution with degrees of freedom m — p — q + 1 and 
correlation matrix r;¿Q/(m—p-—q+ 1). A consequence of this is that 
the density (5.37) of X can be written as the product of conditional 


5.11 Matriz- Variate t Distribution 115 


multivariate ¢ distributions of the rows of X, that is, 


F(X) = fa) f(x | x1) f (xp | x1- -3 Xp-1). 


e If X = (X1, X2), where X; is p X qj, then the marginal distribution of 
X, is a central matrix-variate t with parameters R71, Q22, and m—q1. 
In the particular case X = (x1,...,X,), each column x; has the central 
multivariate t distribution with degrees of freedom m — p — q+ 1 and 
correlation matrix q;;R./(m—p—q+1). A consequence of this is that 
the density (5.37) of X can be written as the product of conditional 
multivariate ¢ distributions of the columns of X, that is, 


Í (X) = f (x1) f (x2 | x1) +++ f (Xg | X1,- --;Xq-1)- 
e If X is doubly partitioned as 
Xi1 Xir ) 
X = 
( Xa Xz J’ 

where Xj; is p; X qj with pı + p2 = p and q + q2 = q, then the condi- 
tional distribution of x7, given X1; and X3; is a matrix-variate ¢ with 
parameters R7,(R7,)-!X7,, Qu + XIRI Xu, Ree — Rei RI} Riz, 
and m+qı —p—q+1. (Here, the partitions of R and Q correspond to 


the partition of X.). Since this depends only on X41, it follows that 
X12 and X2; given Xj); are conditionally independent. 


The following results (due to Javier and Gupta, 1985, and Dickey 
et al., 1986) concern the distributions of the quadratic forms XAXT 
and AXB when X has the central matrix-variate ¢ distribution with 
parameters R, Q, and m. 


e If A> 0 isq xq, then the pdf of W = XAX? is given by 
1 = m= = erent Fe 
f(W) = konna a p/2 [R|‘ 0/2 IQ] p/2 [w]e p—1)/2 


x |R + W7”? 
x Fo (F (R+W) W, h - (QA)™), 


where W > 0, k(m, p,q) is given by (5.34), and ıFo is as defined in 
(5.35). An immediate consequence of this result is that 


/ Jw re -D/2 IR + wl? 
W>o 


xF (5; (R+ W)! W, L - (Qa)*) dw 


116 Multivariate Generalizations and Related Distributions 


2 T aori- q)/2) |A|?/? [R 7079/2 QP? j 
Hence the hth moment of | W | is 
I ((g + 2h)/2)T ((m — q — 2h)/2) 
I (m/2) 
x [APA IRI QP. (5.38) 


pwe 


Further using the fact that an F-distribution is uniquely determined 
by its moments, it follows that | W | can be written as a product of 
q independent univariate F’s, that is, 


|W] ~ I[F@-G-9,m-a-G-1). 


For the special case A = I, and p = q, (5.38) gives the hth moment 
of XXT. 

e If A > 0 is px p and B > 0 is q xq, then AXB has the central 
matrix-variate t distribution with parameters B'QB, AR7!A‘, and 
m,m>pt+q-1. 

e If A > O is px p and B is aq xr rectangular matrix, then AXB 

has the central matrix-variate t distribution with parameters BTQB, 

AR-'AT, and m,m>p+r-1. 

If a is a q x 1 vector, then a? X7 has the central multivariate t distri- 

bution with degrees of freedom m — p — q + 1 and correlation matrix 

a’ QaR/(m—p-—q+1). 

e If ais aq x 1 vector such that aTa = 1 and b is ap x 1 vector, then 
a? Xb is a linear combination of Student’s t random variables. 

e If b is ap x 1 vector, then XTb has the central multivariate t distri- 

bution with degrees of freedom m — p— q + 1 and correlation matrix 

b7RbQ/(m—p-—q+1). 

If a is ag X 1 vector and b is a p x 1 vector, then 

(m —p—q+t1)a?XTb 
(a7 Qa) (b7Rb) 
has the Student’s ¢ distribution with degrees of freedom m— p—q+1. 
e In the special case R. = I, and Q = 1, if a is a real number and b 


is a q X 1 vector such that a?bTb = 1, then aXb has the Student’s t 
distribution with degrees of freedom m — q. 


Javier and Gupta (1985) also derived a useful factorization of the cen- 
tral matrix-variate t density in terms of the product of q— 1 independent 


5.11 Matriz-Variate t Distribution 117 


univariate F densities and q independent multivariate t densities — par- 
alleling the result of Tan (1969a) for matrix-variate beta distributions. 
Let X be a p x q random matrix having the density (5.37) with p = 0. 
Set 

U = (R7?xQ-¥?) (Rx), 


so that U is p x p, symmetric, and U > 0. Partition U as 
Ui Uie ) 
U = 
( Un Ux 
so that Uy, is 1 x 1 and Ug, is (p — 1) x (p — 1). Abbreviating Do. — 
Dza DI Diy by Də2.1, define the following submatrices 


F 1 p 
u% = (us, sae ; J= 1l, 2, ssa p= 1, 


u®, = Uz1, 


l a 
UY = (UR). F=L2 PH, 
and 
uV = Un, 


so that UY), is (p — j) x (p — j) and UY is 1 x 1. With all of this 
notation the factorization of the density of X (due to Javier and Gupta) 
can be stated as 


q—2 y . , 
f(X) = [r (1 nte) 
j=0 


p 
x II tu {14u} (zi a UŞ; m =g= 1)a) ; 
E iom 


where t,(T;r) is the joint pdf of a central multivariate ¢ distribution 
with degrees of freedom v and correlation matrix T and F(a, £) is the 
pdf of a univariate F distribution with degrees of freedom a and £. 

The two predictivistic characterizations of the multivariate ¢ distribu- 
tion based on (5.14) and (5.15) have the following matrix-variate gener- 
alizations 


e Let Xi, X2,... be an infinite sequence of g-dimensional random col- 
umn vectors that are orthogonally invariant (which means that, for 


118 Multivariate Generalizations and Related Distributions 


each k, X® = (X,...,X,)7 and PX) are identically distributed, 
for all k x k orthogonal matrices F) and, for k fixed, let x") = 
(X(i-aesiy--->Xin)?, i = 1,2,.... If X,...,Xq are linearly inde- 
pendent with probability 1 and 


T 
E xf? x 


T 
xt] = ax x48, 


where 0 < a < 1 and B is a q x q positive definite matrix, then 
the distribution of X‘) is the matrix-variate t with p = 0, R = Ip, 
Q = (1/a)B, and m = 1+ (p/a) —p. 

e Let X,,Xo2,... be an infinite sequence of g-dimensional random col- 
umn vectors such that, for each p, X®) and IX) are identically 
distributed, for all p x p orthogonal matrices I satisfying T'1, = 1p 
(where 1, is a p-dimensional vector of 1’s). Under this assumption 
there exists a o-algebra T of events such that 


2 1 
xX, = 22 
> E(Xı|T)=M 


and 


> E(XıXT|T)-E(X | N{E(XT|T)} =V 


as n —> oo (Chow and Teicher, 1978), where the convergence is almost 
everywhere. Moreover, if 


E (xe s 1M"): (xi - 14M") | x, m] 


=a (x? = 1M"). (xi z 1M”) +B, 


where 0 <a < 1 and B is a q x q symmetric positive definite matrix, 
then X) is a location mixture of the matrix-variate t distribution 
with u = 17M, R = Ip, Q = (1/a)B, and m = 1 + (p/a) — p. In 
addition, M and V are independent. 


Dawid (1981) provided a different but more convenient parameteri- 
zation of (5.37). If Y (p x p) has the standard matrix inverse Wishart 
distribution with parameter ô and if, given Y, X (n x p) has the ma- 
trix normal distribution with parameters I,, and Y, then X is termed 


5.12 Complex Multivariate t Distribution 119 


as having the standard matrix ¢ distribution. In the notation of (5.37), 
this would correspond to u = 0, R = Ip, Q = I}, andm=6d+n+4+p-1. 
Under Dawid’s parameterization, if X* is a n* x p* submatrix of X, then 
X* has the matrix t distribution with parameters In», Ip», and 6: Note 
that 6 is unchanged. This kind of consistency enabled Dawid (1981) 
to construct what is termed as the standard infinite matrix t distribu- 
tion. Namely, X = {2;;,i > 1,7 > 1} is said to have the above-named 
distribution if it has the property that for all (n,p) the leading n x p 
submatrix of X has the standard matrix ¢ distribution with parameter 
ô. The standard matrix ¢ distribution also has the attractive property 
of being spherical, that is, if P (n x n) and Q (p x p) are two orthogonal 
matrices, then both PX and XQ have the same distribution as X. 


5.12 Complex Multivariate t Distribution 


A complex normal random vector Y = V +/—1W is a complex random 
variable whose real and imaginary parts possess the bivariate normal 
distribution. A complex p-variate normal random vector 


Y V+vV-1W 
(Vi + V=IM, Vz + V=1W),...,Vy + VZIW,)” (5.39) 


is a p-tuple of complex normal random vectors such that the vector of 
real and imaginary parts (Vi,W,,...,V,, Wp) has the 2p-variate normal 
distribution (Goodman, 1963). Section 45.13 of Kotz et al. (2000) pro- 
vides an account of this distribution. It is usually assumed that the 
2p-variate normal distribution of (V1, W1, ..., Vp, Wp) has zero means 
and covariance matrix given by 


1/5 -52 

2 ( Sy 5; ) i 
where 1 is symmetric (matrix A is symmetric if AT = A) and D2 
is skew-symmetric (matrix A is skew-symmetric if A = —A?). From 
the given structure it is easily seen that the covariances of the p-variate 
vectors V and W are each equal to ©1/2 and the covariance between V 
and W is equal to 2/2. Hence the covariance of the complex p-variate 
normal random vector Y in (5.39) is ©; + V—1Z_ = Ð, say. The 


properties of the distribution of Y have been studied by many authors. 
The joint pdf of Y is given by 


fy= =a exp {-y7Z""y}, (5.40) 


120 Multivariate Generalizations and Related Distributions 


where y denotes the complex conjugate of y (Goodman, 1963). For ex- 
ample, for the complex univariate normal distribution, y = vı + V— 1w 
and the covariance matrix © = 07, and thus the joint pdf of Y becomes 


1 v? +w? 
f(y) = noa P Gann . 


The characteristic function of Y can be shown to be 
1 
E [exp {i (s7V + t7 W) }] = exp {-797Eu , 


where u = s + v—it (Wooding, 1956). Explicit expressions for the 
moments of Y have been derived by Sultan and Tracy (1996). The 
complex multivariate normal distribution has applications in describing 
the statistical variability of estimators for the spectral density matrix of 
a multiple stationary normal time series and in describing the statistical 
variability of estimators for functions of the elements of a spectral density 
matrix of a multiple stationary normal time series. 

Relatively few results are available that deal with complex multivari- 
ate t distributions. Originally, the complex multivariate ¢ distribution 
was introduced by Gupta (1964). Let Y have the complex p-variate nor- 
mal distribution with zero means, common variance g?, and covariance 
matrix o?R. Let 2vS$?/o? have the chi-squared distribution with de- 
grees of freedom 2v, distributed independently of Y. Then X = Y/S is 
said to have the complex p-variate ¢ distribution with degrees of freedom 
v and correlation matrix R. By writing down the joint distribution of 
S and X and then integrating out S, the pdf of X can be obtained as 


Tv +p) ERREN 

= ——— {1+3 R ; 
0) = prom e 

Tan (1973) discussed some properties of this distribution. Tan (1969b) 

provided a brief discussion of a complex analog of the matrix-variate t 


distribution given by (5.37). 


5.13 Steyn’s Nonnormal Distributions 


Strictly speaking, this section does not deal with multivariate t distri- 
butions per se. This section is about nonnormal distributions arising 
from the class of multivariate elliptical distributions that contains the 
multivariate t as a particular case. 

One weakness of the class of multivariate elliptical distributions is that 


5.13 Steyn’s Nonnormal Distributions 121 


all fourth-order cumulants are expressed in terms of a single kurtosis pa- 
rameter (moreover, the univariate marginals have zero skewness and the 
same kurtosis). In fact, the cumulant generating function (cgf) and the 
moment generating function (mgf) of a p-variate elliptical distribution 
with zero means and correlation matrix .R are 


KiGgsiigty) 
ga (tTRt) + T 5 (erry) +% A (t7Rt)* (5.41) 
2 2° (2 
k>3 
and 
M (ti... tp) 
tTRt « (t? Rt)’ "ONN 
= exp (=) 1+ 3 i + 5- Bi (t Rt) j (5.42) 


k>3 


respectively, where Ag, Bk are constants and « is the kurtosis parameter. 
Steyn (1993) attempted to introduce meaningful multivariate distribu- 
tions that are related to the elliptical distributions and that contain 
more than one kurtosis parameter. 

As an example, consider a random vector (X1, X2, X3) possessing the 
three-dimensional normal distribution with the mgf 


M (t1, te, ts) 


1 
= exp 3 (e + t + A + 2riztit2 + 2rı3tıtz3 + 2rastata) | . 


(5.43) 


Suppose this model is placed in a changing environment that favors a 
change in one of the random variables, say X1, in such a way that the 
kurtosis should be taken into consideration. Specifically, assume that. 
the marginal distribution of X; is elliptical with the kurtosis parameter 
#1, while the conditional distribution of (X2, X3) given X, = zı remains 
unchanged. Note that (5.43) can be written as 


M (t1, t2, t3) 


1 1 
Tix exp E {(1- r?) t + (1-— r?) t3 + 2 (r23 — rieris) it}| 


x f exp {-3 + (tı + ryote + T13t3) T1 } dz,. (5.44) 


—oO 


Changing the probability element in the integrand in (5.44) to that of 


122 Multivariate Generalizations and Related Distributions 


the elliptical distribution in (5.42), one can show that the mgf changes 
to 


K 
Mi; (ti, t2,t3) = M (t, te, ts) exp {5 (tı + rigte + rigt)“ +e } ; 
The corresponding cgf becomes 
1 2 2 2 
Ky (ti, te, t3) = 3 (t +t + t3 + 2rjotite + 2rigtit3 + 2rogtat3) 
1 
+z“! (ti + rita +rigta) +o. (5.45) 


Setting t2 = tz = 0, the cgf of the marginal distribution of X, is given 
by 


1 1 
Kı (t1,0,0) = sti + gait too, 


which shows (as it should) that the marginal distribution of X is ellip- 
tical with kurtosis parameter xı (compare with equation (5.41)). How- 
ever, for tı = t3 = 0 and tı = t2 = 0, one obtains 


1 1 
Kı (0, t2,0) = zt? + g“! (riate)* +... 
and 
1 2 1 4 
Kı (0, 0, ts) = z3 t g“! (rı3t3) tees 


thus, the marginal distributions of X2 and X3 are also elliptical but 
with kurtosis parameters Kir{, and Kir{3, respectively. Furthermore, 
for t = 0, 


1 1 
K; (0, to, ts) = 3 (È + 2ragt2t3 + t2) + g“! (rizt2 + rigts) Trt 


which shows that the joint marginal distribution of (X2, X3) is not el- 
liptical. The fourth-order cumulants are easily obtained from (5.45) as 
Kijk = 3K riort,, where i +j +k = 4. 

Suppose now that the model given by (5.43) is placed in an environ- 
ment that favors a change in not only X, but also influences (X2, X3). 
Assume — in particular — that the conditional distributions of Xz given 
X, = x, and X; given (X1, X2) = (21,22) are elliptical with kurtosis 
parameters Ky and «3, respectively. Then calculations similar to those 
above show that the mgf (5.43) changes to 


1 w\? 1 usoz, i 
M2 (ti,te,tz3) = M (t1,te,t3) exp sm (4) +e ( ais | 


5.18 Steyn’s Nonnormal Distributions 123 


2-2 2 
K3 (#7) th, (5.46) 


ui = ti +rTiztz +7i3ts, 


1 
2 


where 


T23 — 712713 


u = tot 
á 1-ri, 


t3, 


and 


2 

2 auj 2 (r23 = 12713) 
9312 = 77137 Ler? 
12 


It is easily seen that the marginal distributions of Xı, X2, and X3 
are elliptical with kurtosis parameters given by x1, rj2«1 + 041K2, and 
rizki +(03. —04..)K3, Where o2, = 1—r?,. This time, the fourth-order 
cumulants are given by 
j 2(j-2 k 
Kijk = 3 {ririri + 2034 (r23 — T2713) } ; 

where i +j +k = 4. In the case of K004, K303.;2 should be added. 

Similar constructions can be performed when X = (X1,..., Xp)” has 
a p-variate normal distribution with zero means, covariance matrix R, 
and the corresponding mgf 


1 
Mə (ti... tp) = exp (5¢7Rt) l (5.47) 
Consider two environments similar to those considered above for the 


trivariate normal model. First, divide X into two random vectors X® = 
(X1,...,Xa)? and X@) = (Xn4i,...,Xp)", and let 


R = ( Rit Riz ) 
Ri, Re 
be the corresponding partition of the correlation matrix. Also let t® = 
(tis... th)? and t) = (tn41,...,tp)? be the corresponding partition 


of t. Now assume that the marginal distribution of X“) is changed 
to an k-dimensional elliptical distribution with kurtosis parameter «sı 


124 Multivariate Generalizations and Related Distributions 


(a) (b) 
m 
N yo 
7 
° 
3 -1 1 3 
x1 
(d) 
oO 
g g P 
E 
? 
-3 -1 1 3 
x1 x1 


Fig. 5.5. Steyn’s bivariate pdf corresponding to (5.46) for t3 = 0 and (a) 
kı = 0.8, k2 = —0.4, and ri2 = 0.2; (b) kı = 0.8, k2 = —0.4, and rı2 = 0.8; 
(c) kı = —0.4, k2 = 0.8, and rı2 = 0.2; and (d) xı = —0.4, k2 = 0.8, and 
Ti? = 0.8 


and that the conditional distribution of X® given X(® = x) remains 
unchanged. Then calculations show that (5.47) changes to 


T 
M,(t) = M(t)exp Bi (t +R Rat) 
2 
xRıı (6 + RI Rist”) +e | . (5.48) 


Clearly, 


lia? 1 2 
Mı (6, 0) = exp tie Ry, t + g“ (tO Rut) Meat | 


5.13 Steyn’s Nonnormal Distributions 125 


and 


1 
Mı (0,6) = exp [po Ra 


2 
+5 (tO RRF Rist” ) seve \ 


which shows that the marginal distribution of X‘ is an h-dimensional 
elliptical distribution (as it should) while that of X® is not elliptical. 
The second-order cumulants of (5.48) are the same as those for (5.47). 
For the second construction, partition Xz into KX) = (Xp,...,Xn+s)? 
and X® = (Xpseui,---, Xp)’, and let t() be partitioned correspond- 
ingly into t@) and t(®. Let C denote the conditional covariance matrix 
of X, given X; = xj, that is, 


C = Rz -RaR Riz, 
and let C be partitioned as 
Cu Ciz ) 
C = : 
( Ch Cz 
so that C,; is sxs, Cy. is s x (p—h-s), and C22 is (p—h—s) x (p—h-s). 
Now, assuming that the distribution of X, is elliptical with kurtosis 


parameter «, and that of X®) — E(X®) | X® = x) is elliptical with 
kurtosis parameter %2, one can show that the mgf (5.47) changes to 


1 T 
M2(t) = M(t) exp (da (© + Ri Riot) 

2 

xRy (t® +R Rist) } + | 

1 E T 
+ (zel (t + Cit Ct) 

2 

xCii (© + CI Cat) | n 3] x 


(5.49) 


This defines the mgf of a multivariate distribution that is equal to the 
product of the mgf of the multivariate normal and a function of two 
quadratic forms in t depending on the two kurtosis parameters k;, i = 


126 Multivariate Generalizations and Related Distributions 


1, 2, and on the elements of the normal covariance matrix. Setting t?) = 
0 into (5.49), we see that KX“) has an A-dimensional elliptical distribution 
with zero means, covariance matrix R11, and kurtosis parameter «x; (as 
it should). If either t® = 0 or t® = 0 and t® = 0, then M(t) 
becomes a function of three different. forms. 


5.14 Inverted Dirichlet Distribution 


There is a close connection between the multivariate ¢ distribution de- 
fined by (1.1) and the inverted Dirichlet distribution (Cornish, 1954; 
Dunnett and Sobel, 1954). To see this, consider the central p-variate t 
distribution with the pdf 


T ((v + p)/2) h + ls aaa 
(nv)P/2T (v/2) {RI 
Upon transforming to the canonical variables Z = (Z1, ..., Zp), Z = 


PX, where P is a p x p matrix such that PTP = R7!, it is easily seen 
that, 


f(x) = 


V 


-(v+p)/2 
TAW tp)/2) | +z . (5.50) 


f@) = renl to 


V 
In (5.50) now perform a further transformation T; = Z?/v, which is 
one-to-one in each of 2? regions with the Jacobian 


[J] = Puti tp. 
Consequently, the joint pdf of TT = (Tis. , Tp) becomes 
—(v+p)/2 
F ((v +p)/2) -1/2 4-1 ~ 
t) = agar tad fal ti ; 
f(t) rPI (y/2) + R +2 


which is the inverted p-dimensional Dirichlet distribution D'(1/2, ..., 
1/2; v/2); see, for example, Kotz et al. (2000, Chapter 49). 


6 
Probability Integrals 


There has been a very substantial amount of research carried out on 
probability integrals of multivariate ¢ distributions. Most of the work 
was done during the pre-computer era, but recently several computer 
programs have been written to evaluate probability integrals. 

Sections 6.1 to 6.7 by now may have lost some of their usefulness but 
are still of substantial historical interest in addition to their mathemati- 
cal value. We have decided to record these results in some detail in this 
book in spite of the fact that some of the expressions are quite lengthy 
and cumbersome. Sections 6.8 to 6.13 contain more practically relevant 
and modern results. 


6.1 Dunnett and Sobel’s Probability Integrals 


One of the earliest results on probability integrals is that due to Dunnett 
and Sobel (1954). Let (X1, X2) have the central bivariate ¢ distribution 
with degrees of freedom v and the equicorrelation structure rj; = p, 
i Æ j. The corresponding bivariate pdf is 


1 T? + z} — 2px, T2 er" 
T1, T2; V, = — = +S 6.1 
femino = zp {1+ BSE e 


with the probability integral 
y2 yı 
P (y1,Y2;v,p) = f f (z1, £2; v, p)dzıdz2. (6.2) 
—00 4—00 


Let 


(yı — py)? 


z (M, 41, Y2) lyi — py)? + (1 — p?)(m + y2) 


127 


128 Probability Integrals 


and let 


z(m,y1,y2) T(a +b) id ae 
Ismu) (ab) = Í Tarib)! (1— y) dy 


denote the incomplete beta function. Dunnett and Sobel (1954) evalu- 
ated exact expressions for (6.2) when v takes on positive integer values. 
For even v and odd v, they obtained 


y1- 


Tj - 1/2) 2) ae 
ee ye r(j) (1 K 
x f + sgn(yi — py2)lae(vy ya) (5.3 7 N 
v/2 1/2-3 
Aa na 
ie aM (1+ 4) 


j= “i 
x j + sen(y2 = pth Malviya) (5.3 = z) 
(6.3) 


1 
P(yi, yz; v, p) = zp retan 


and 


(v—1)/2 : 
yi Tj) +4 
4yor 4 TG+1/2) 


(6.4) 
respectively. Here, 


a = yt ye, 


B = yiyot pv, 


6.1 Dunnett and Sobel’s Probability Integrals 129 


y = NYy2-y, 


ô = yt — 2pyryo +y3 + v(1 — p?). 


In the special case yı = yz = 0, both (6.3) and (6.4) reduce to the neat 
expression 


and 


1- p? 
P(0,0;v,p) = arctan ~———., (6.5) 


which is independent of v and is therefore identical with the correspond- 
ing result for the bivariate normal integral. Since the number of terms 
in (6.3) and (6.4) increases with v, the usefulness of these expressions 
is confined to small values of v. Dunnett and Sobel (1954) also derived 
an asymptotic expansion in powers of 1/v, the first few terms of which 
yield a good approximation to the probability integral even for mod- 
erately small values of v. The method of derivation is essentially the 
same as that used by Fisher (1925) to approximate the probability in- 
tegral of the univariate Student’s ¢ distribution: Express the difference 
f (a1, 22; v, p)—f (z1, 29; 00, p) as a power series in 1/y and then integrate 
this series term by term over the desired region of integration. Setting 


2 _ Yi —2pyrye +y 
ro S B ; 


Dunnett and Sobel obtained 
f(y yz vp) = 1+ (5-7) N (5 - +r) t 


f(y1, Y2; ©, p) 
a PA idee £ 13r  \ 1 
384 9% 4A 
ri6 ri4 17r}2 7710 g\ l 
+{(——-—+ — —— 
6144 128° 144 120 
= 14+D(r), 


say. Thus, the desired probability integral is 
y2 yı 
Ply yz v, p) = f f (21, z2; 0, p)dzı dz2 
=a J-0o 


y2 yı 
+f D(r) f (z1, £2; 00, p)dzıdz2. (6.6) 
—oo J — o0 


The first term on the right-hand side of (6.6) is the integral of the bi- 
variate normal pdf, and it has been tabulated by Pearson (1931) with 


130 Probability Integrals 


a series of correction terms. The second term can be integrated term 
by term to obtain an asymptotic expansion in powers of 1/v. Dunnett 
and Sobel gave expressions for the coefficients A, of the terms 1/v* for 
k = 1,2,3,4. The first of these coefficients takes the form 


A, = 2 4(a)6(u2) + Hoan) - BAY 6 an) Ba) 


4 
-HWD 5) 00, 


where ¢ and ® are, respectively, the pdf and the cdf of the standard 
normal distribution, and 


= YLT PU 
1- p? 
ER yz — py 
Vie 


In the special case yı = y2 = y, (6.6) reduces to 


P(y,y;¥,p) = f l: f (z1, £2; 00, p)dzıdz2 


4 Aj Az Á3 Ag 
yw ps yA 
with the first two coefficients A; and Az now taking the forms 


A = -1e YEU) tly? + 1) Bley) -ya (cv)} 
and 
A, = seu (3y° — 7y* — 5y° — 3) B(cy) 
—y®' (cy) [By* (ct + 3c? +3) - y? (è +5) - 3] }, 


where c = y(1 — p)/(1 + p). In this special case, Dunnett and Sobel 
(1954) tabulated numerical values of the coefficients A; for selected val- 


ues of p, y, and v. The following table gives the values for p = 0.5 


6.2 Gupta and Sobel’s Probability Integrals 


Coefficients of the asymptotic expansion (6.7) for p = 0.5 


id 


y 


0.25 
0.50 
0.75 
1.00 
1.25 
1.50 
1.75 
2.00 
2.25 
2.50 
3.00 18 


=. ONDDOA A 


=. m 
bo © 


Ay 


-0.025870 
-0.057784 
-0.100016 
-0.150182 
-0.198378 
-0.231628 
-0.240531 
-0.223682 
-0.187525 
-0.142571 
-0.062685 


Ag 


0.003371 
0.008999 
0.021983 
0.047374 
0.079687 
0.096254 
0.067469 
-0.020268 
-0.149011 
-0.276255 
-0.376815 


A3 


0.003816 
0.006868 
0.006891 
-0.006835 
-0.033130 
-0.038696 
0.052274 
0.293449 
0.623867 
0.858993 
0.432592 


Ag 


-0.001050 
-0.002155 
-0.001879 
0.007991 
0.036817 
0.032808 
-0.191482 
-0.819219 
-1.618705 
-1.765249 
2.236773 


131 


These values can be used to construct tables for the probability integral 


in (6.7). 


6.2 Gupta and Sobel’s Probability Integrals 


Gupta and Sobel (1957) investigated the special case when X follows 
the central p-variate t distribution with degrees of freedom v and the 
correlation structure r;; = p = 1/2, i Æ j. Tf Yi, Y2,...,Yn,Y are 
independent normal random variables with common mean and common 
variance g°, and if vS*/o? is a chi-squared random variable with degrees 
of freedom v, independent of Y1, ¥2,..., Yn, Y , then one can rewrite the 
probability integral as 


d d 
J f f (z1, .--., £p; V, p)d£p' ++ dary 
—0o —oo 
E: ra ae +45 
prf ee } < vial 


P(d) 


where M, = max(Yj, Y2,... 


Pr (== < v24) 


S 


Pr (Z < v2d), 


(6.8) 


,Yp) and Z = (Mp — Y)/S. Gupta and 
Sobel (1957) provided four useful expressions for P(d). These are by now 
classical results applicable in statistical inference. The first expression 


132 Probability Integrals 


is derived by fixing Y and S in (6.8) and integrating with respect to M, 


P(d) = L h(s) i $P (y)d (, — Vias) J ds, (6.9) 


where ¢ and ® are, respectively, the pdf and cdf of the standard normal 
distribution and h is the pdf of the chi-squared distribution with v de- 
grees of freedom. Based on the fact that the pdf ¢ admits an expansion 
about d = 0, it easy to justify a term-by-term integration of (6.9) to 
obtain the second expression 


CO ok/2 ak : 
P(d) = E AES Hh (Pn), 
p+ 1 i20 k! oO 
where 
S k 

Ar { ($) }= ie aA (6.10) 
(3) 3) 

is the kth moment of x, / vY (provided that k > —v) and Hy, is the kth 


Hermite polynomial defined by 


(aa (-5) = Hy,(x)exp (-=). (6.11) 


A third expression for P(d) is derived by first expanding ¢ about S =o 
and then integrating term by term, obtaining 


OU vd) — a9 (ov) = (5) 
+d¢ (y— V2d) E (2 m 1) i ja 
[2o (u- 3d) ay 
-V3a(1- As) f” (y- Vd) 8°(u)0 (y - Va) ay 


P(d) 


+2d? (1 -a f fy? — 2/2dy + 2d? — i} 
x BP(y)p (v- v2d) dy +-->, 


where A, is given by (6.10). Each of the integrals above can be evaluated 
by expanding the pdf ¢ about d = 0, as was done in (6.9). The fourth 


6.2 Gupta and Sobel’s Probability Integrals 133 


and final expression for P(d) given by Gupta and Sobel (1957) uses 
the result of Seal (1954) that the distribution of D = (Mp — Y)/o is 
asymptotically normal as p tends to infinity. It follows directis from 
Seal’s result that the third and higher central moments of D tend to 
the corresponding moments of the standard normal distribution. Since 
the coefficients involving v in A_; in (6.10) tend to unity as v > œ, it 
follows that the third and higher central moments of Z = (Mp — Y)/S 
tend to the corresponding moments of the standard normal distribution 
as both v and p tend to infinity. It is therefore reasonable to approximate 
the distribution of W = (Z — E(Z))/,/Var(Z) by a Gram-Charlier 
expansion in the Edgeworth form, where 


E(Z) = A_1@p,1 
and 
Var(Z) = A-2(ap2+1)- (A-1ap,1)” j 


Here, ap, denotes the ith moment of the largest of p independent stan- 
dard normal random variables. Using equation (17.7.3) of Cramér (1951) 
and letting ds = (V2d — E(Z))/,/Var(Z), Gupta and Sobel obtained 


P(d) = Pr (Z < v2a) 
= B(d,) - FA” (ds) 
+6 (dy) + 36 (a) 
35 

-F (4) (d) — ee (a ds) — Ta (8) (d,) +, 

where 
ee | 
Qk = Jrz (6.12) 


is the kth standardized cumulant of Z obtained from the moments 
around the origin. 

In a related development, Gupta (1963) studied the above case p = 1/2 
and showed that P(d) = P(d; v) satisfies 


dP (d; v) 
dd 


which is Hartley’s differential-difference equation for the probability in- 
tegral of a general class of statistics known as Studentized statistics. 


+v {P(d;v)-— P(dq,v +2)} = 0, (6.13) 


134 Probability Integrals 


Using Hartley’s solution (obtained using the theory of characteristics), 
Gupta obtained an approximation for P(d;v) in powers of 1/y and re- 
marked that it can be computed by using the Gauss-Hermite quadrature. 
Gupta et al. (1985) extended this result for any p > 0 and showed that 
P(d) satisfies (6.13) in this case too. In this case the approximation for 
P(d) in powers of 1/v is 


P(d) = Gld.) + Y` Leld, (6.14) 
k=1 


where Lp is the kth correction term and G is the joint cdf of a p-variate 
normal distribution with zero means, common variance g?, and the 
equicorrelation structure r;; = p, i # j. Letting G) (d) denote the 
kth-order derivative of G(d,...,d) with respect to d, the first four cor- 
rection terms can be written as 


1 
= HD-a 
L,(d) at a \, 
1 
L:(d) = ao 32) - 10a) + 9a - 2a}, 
I3(d) = + {4 — 7a) +170 — 17a + 6a) 
3 6v3 ? 
and 
1 
L(d) = A f15a® — 180a + 8300 — 18480) + 20150 
—900a®) + 20a) + 48a®}, 
where 


1 
at) = rr, k=1,2,...,8 


and the first eight y% (d) are 


gd) = dEM(d), 

gd) = #G°d) +dG(d), 

pd) = PGS (d) +3?G(d + dE (d), 

yp) (d) dG (d) + 682G® (d) + 7G (d) + dG (d), 

p) (d) BG (d) + 10d*G™ (d) + 25G (d) + 15G (d) 
+dG) (d), 

p® (d) = dG (d) + 15d°G® (d) + 65d4*G™ (d) + 908G (d) 


6.3 John’s Probability Integrals 135 


+31 G® (d) + dG (d), 

yd) = d’GO(d) +21d°G(d) + 140dG®) (d) + 350d4*G™ (d) 
+301G®) (d) + 63G (d) + dG) (d), 

y®(d) = G®(d) + 28d’G (d) + 266d°G (d) + 1050d°G) (d) 
+1701d'G™ (d) + 966a°G®) (d) + 127d2G) (d) 
+dG™)(d). (6.15) 


Thus the evaluation of P(d) in (6.14) involves that of G(*) for k = 
0,1,...,8, and we shall discuss in Chapter 8 how the latter can be 
performed. 


6.3 John’s Probability Integrals 


John (1961) provided alternative formulas for the evaluation of the prob- 
ability integral. Although the method is discussed in detail only for the 
bivariate case, it has wider applicability in the sense that it can be 
adopted to obtain the probability integral of the multivariate t distribu- 
tion for any dimension. 

Let X be a p-variate vector having the central ¢ distribution with 
degrees of freedom v and correlation matrix R. Using the definition 
that X can be represented as (Z1/S, Z2/5,...,;Zp/S), where Z is a p- 
variate normal random vector with correlation matrix R. and v$?/o? is 
an independent chi-squared random variable with degrees of freedom vy, 
one can show that the characteristic function of X is 


E (exp (it’X)) = E(E (exp(it?Z/s |S = s)) 
1 = v{2—-1 A = 
= a. gl? exp (-z - at R tt) dz. 


In the case p = 2 with the equicorrelation structure r;; = p, i Æ j, the 
above expression reduces to 


1 A] VPN? ; 
. . a v/2-1 cae Ae izi 
E (exp (itıXı + it2X2)) = Tw/2) | T f J il ( =) as} 


i=0 


v 
x exp { -z - 7E (tf + #3) } dz. 


By the inversion theorem, John (1961) derived the corresponding joint 
pdf as an infinite series of one-dimensional integrals. Integrating the 


136 Probability Integrals 


infinite series term by term, the probability integral becomes 


1 É 
P (yi, Y2; v, p) = wo (Y1s¥2) + = J Yvi (Usd), 
i=1 ` 
where 
1 CO 
yvo(¥i,¥2) = ners f aY/?—! exp(—z) 
0 
x® 2r4y1 ® 2ry2 dz 
vo v 
and 


Yvi (Y Y2) = wD jee exp |- {1 + (ut +} 
x Aj (4) Hes (42) dx 


for? =1,2,.... Here, ®(-) is the cdf of the standard normal distribution 
and H, denotes the Hermite polynomial of order k defined by (6.11). 
John provided explicit algebraic expressions for y,; for i = 1,2,...,6. 
The first three of them are 


yi (yny) = 27%, 
yv.2(y1,y2) = yiya PPD, 
and 
2 = z a 
Yna (Y y2) = (1+2) yiyge +2) — (y? + yf) VPED 4 2, 


where z = (y? + y3)/v +1. In principle, explicit expressions for y,,; can 
be obtained for any 7 > 1. To evaluate y, o, the integration has to be 
done numerically. John tabulated values of this quantity for v = 11,12 
using Gauss’ formula for a numerical quadrature (Kopal, 1955, page 
371). He also provided several useful recursion relations. For example, 
values of y,,o(y1,y2) for yı negative or y2 negative or both negative can 
be found from the formulas 


Yvo (yi,¥2) = T, (y2) — yro (Y1: 42); 


Yvo (Y2) = Ts (y1) — Yvo (Y1; —42), 


6.4 Amos and Bulgren’s Probability Integrals 137 


and 


Yvo (Yyy) = 1+yv0(-y1,—-y2) — Ty (—y1) — Ty (—y2), 


where T, is the cdf of the Student’s ¢ distribution with v degrees of 
freedom. 


6.4 Amos and Bulgren’s Probability Integrals 


In a widely quoted paper, Amos and Bulgren (1969) derived several 
representations for (6.2) in terms of series and simple one-dimensional 
quadratures, together with efficient computational procedures for the 
special functions used in their numerical evaluation. One of the quadra- 
ture formulas given is 


1 
Qn(v +1)(1 +7? + ¥2)"/2 


x if oF, (1 53 SENI — c? cos? (0 — é)) dô 
= e+» 
Val (y/2)(1 +7 +93)" 
> a I {cos(@ — $) < 0} cos(6 — ¢) 
o {1 - 2 cos?(0 — gy}? 


where >F, is the Gauss hypergeometric function, J{} is the indicator 


function, 
_ “+H 
Cra 2 ae? 
1+ ty 


P = 


, 


Ài 
n = (tml) 
A 
v = m-ni 
I 
EE N ERA 
Lap 


fl 
02 = Ren i 


138 Probability Integrals 


ps arctan (y2/71) , ify, > 0, 
mw + arctan (y2/n), ify <0, 


1 
Àl = Sa 
1+p 
and 
1 
à = —. 
1-p 


One of the series formulas given is 


c) P(x + k)/2) 


EEE ag E e ie) 
2AT) A 1+ 73 + AA T k)/2) 
62 
: f cos* (0 — ¢)d8. (6.16) 
41 
For the special case v = 1, P can be reduced to the closed-form expres- 
sion 
1 2v 2, 2 
PiS arctan (=) +I {u +v’ < 1}, 
where 
i 2r sin ġ 
~ A(1+r?+2rcosġ)’ 
Ti l-r? 
~ A(1+r?+2rcosġ)’ 
fen VIS 2 
I+VII ER 
and 


= 6. — T 
A = tan ( 5 | 


If in addition p = 0, then the expression for P reduces further to 


1 
P = — {estan (m) + arctan yı + arctan y2 + z) ; 


2r Jitu +y 3 


The advantage of these expressions over the ones given by Dunnett and 
Sobel (1954) is that these are easier to compute, especially for large 


6.5 Steffens’ Noncentral Probabilities 139 


degrees of freedom. For instance, the integral in 8 in (6.16) can be ex- 
pressed in terms of incomplete beta functions that are extensively tab- 
ulated. Amos and Bulgren (1969) numerically evaluated values of P for 
all combinations of p = —0.9, —0.5, 0, 0.5,0.9 and v = 1, 2,5, 10, 25, 50. 


6.5 Steffens’ Noncentral Probabilities 
Consider the p-variate noncentral t distribution defined in (5.1). Moti- 
vated by the Studentized maximum and minimum modulus tests, Stef- 
fens (1970) studied the particular case for p = 2 and R = I). In this 
case, the joint pdf (5.1) reduces to 


o0 


2 2 oo 
Flees) = exp (848) 2 ry eee) 


2 nT (v/2) == kll! (b+) /2+1 


(v+k+1+2)/2 


x (Vian) (Vez) (142+ 2) 


where €; = u; /o are the noncentrality parameters and v denotes the de- 
grees of freedom. The testing procedures involve maximum or minimum 
values of the components X, and X2 and the computation of the cor- 
responding probabilities. For this reason, Steffens (1970) derived series 
representations for probabilities of the form P) = Pr(| Xi |< A,| X2 |< 
A) and P, = Pr(| X |> A,| X2 > A). It is seen that 


_ G&+& 7/2)" (63/2) 
P, = zep (E P 
n/4 
x i (sin?* v cos” v + sin” v cos”* v) 
0 
xIa(k+1+1, 5) dv 
and 
i grga 8/2" (8/2) 
BAe (A) > eae a 


n/4 
x | (sin?* v cos” v + sin” v cos** v) 
0 


x {1 -Ip (k+1+1,5)} do, 


where Tẹ denotes the incomplete beta function ratio, a = A? sec? v/(v + 
A? sec? v), and 8 = A*cosec?u/(v + A®cosec?v). Using these represen- 


140 Probability Integrals 


tations, Steffens estimated values of the critical points A for all com- 
binations of v = 1,2,5,10,20,50,00 and &,£ = 0(1)5 for the signif- 
icance level 0.05. In a more recent development, Bohrer et al. (1982) 
developed a flexible algorithm to compute probabilities of the form 
Pr(eyr < Xp < €21,...,C1p < Xp < Cop) associated with the noncen- 
tral p-variate distribution (5.1). 


6.6 Dutt’s Probability Integrals 


Dutt (1975) obtained a Fourier transform representation for the proba- 
bility integral of a central p-variate ¢ distribution with degrees of freedom 
v and correlation matrix R 


yı Yp 
PY iso Up) = f ot! Z,+-+,Lp;V)dx,y--- dx). (6.17) 
Using the definition of multivariate t, one can rewrite (6.17) as 
P(y1,---,Yp) =o, a 2” exp (—z?/2) G(hi,..., Gp) dz, 
2¢/2T(v/2) Jo 


(6.18) 


where k = yxz//v, k = 1,...,p and G is the joint cdf of the multivari- 
ate normal distribution with zero means and correlation matrix R. In 
the case y = 0, one has P independent of v and 


P (yi,-.-,Yp) = G(0,...,0). 


Explicit forms of G for p = 2, 3, 4 in terms of the D-functions are given in 
Dutt (1973). The D-functions are integral forms over (—0o, 00) defined 
by 


|i*| oo oe) di, 
Dg (t1,.--, tp; R) = oar | sane Sı 
—00 —0o 


k k 
x exp (Zes — Soe 2) dsk `- ds1, 
1=0 1=0 
where the first five d are 
dist, 
dg = di2, 
dz = dy2413423 — (di2 + diz + dos) , 
d4 = —Gy2413423414424434 + di2413423 + d124+14+24 + di3414434 


6.6 Dutt’s Probability Integrals 141 


+d23424434 — (diz + diz + do3 + dig + doa + d34), 
ds = —d12413+23+24+34+15+25+35+45 + 12413423414424434 

+12413423415425435 + d12+14+24+15+25+45 

+di3+14+34+15+35+45 + d23+24+34+25+35+45 

= (di2413423 + di2+14+24 + d124+15+25 + di3414434 

+d13415435 + d144+15+45 + do3424434 + do3425435 


+do4425445 + d34435445) + dio + di3 +++: + dys, 


and 


dpiqit--+pmam = 1l- exp {- (rpq Sp: Sq He +8 pmam Spm Sam) Y- 


Using the notation 
Drijine = Dk {thse oti R (tis oti) 


where R(t;,,...,t,,) is the correlation matrix based on the subscripts 
jı, --- Jk, Dutt (1973) provided the following explicit forms for G 


G(ti,t2) = {1—(t,)} {1— © (t2)} + Dae, 


G (t1, ta, tz) = {1— ®(t,)} {1 — ®(t2)} {1 — & (t3)} 
+ {1 — & (t))} D2:2,3 + {1 — © (t2)} Dons 
+ {1 — @ (t3)} D2:1,2 + D3:1,2,3; 


and 


4 
G (ti, ta,ts,t4) = |] {1-®(te)} + {1- &(t:)} {1 — 8 (t2)} Dosa 


k=l 
+ {1 — @(t1)} {1 — ® (ts)} D2:2,4 
+ {1 — ® (t2)} {1 — ® (t3)} Dz:1,4 
+ {1— @(t:)} {1 — © (t4)} Do.2,3 
+ {1 — ® (t2)} {1 — ® (t4)} D2:1,3 
+ {1 — @(t3)} {1 — ® (t4)} Done 
+ {1 — © (t)} D3:2,3,4 {1 — © (t2)} D3:1,3,4 
+ {1 — È (t3)} D3.1j2,4 + {1 — © (t4)} D3:1,2,3 
+Da:1,2,3,4- 


A much simplified representation for G in terms of the error function, 
erf(-), and integral forms over (0,00), denoted as the D* functions, is 


142 Probability Integrals 


given in a later paper by Dutt (1975). These D*-functions are defined 
by 


ee ee E 
k (tis---tp; R) s, (27)F 5 ds 0o S1°''Sk 


k 
x exp (- ya /2) ds, --- ds}, (6.19) 
i=0 


where for the first few k are 


dă = sin(tisı), 
d, = 
2 = €—12COS1—2 — €12 COS1 +2, 
k 
d3 = €12+13+23+14+24+34 COS1424344 


+e12—13—23—14—24+34 COS—1~2+3+4 
+e—12+13—23—14+24—34 COS_142-344 
+e—12—13+23+14—24—34 COS1—2—3+4 
—E—12—13+23—14+24+34 COS_14243+44 
—€~12+413-23414—24434 COS1—2+3+4 
—€12-13~23414+4+24—34 COS1+2+3+4 


—€12413+23-14—24—34 COS1+2+3—4 


and for notation 


Cprarttpmam = EXP È- (pia Spı Saa +3 + Tpm am Spm Sam) J > 
SiINp,+--+pm = SİN (tp Sp, +t + tpm Spm) > 
COSpi+:+pm = COS (tpi Sp, +++ + pm Spm): 


(A negative sign on the index pıqı corresponds to +1p, 9, Spı Sq and —P1 
corresponds to —tp; Sp, -) Important special cases of these functions are 


1 yY 
D* = -erf{—}], 
D3 (0,0;R) = Bs arcsin (r12) 
2 Vs = on 12); 
and 
D,(0;R) = 0, for k odd. 


Using the abbreviation that 


De tae = D% {tj,,--- ty R (tjs -3 tjr) Jo 


6.7 Amos’ Probability Integral 143 


Dutt (1975) provided the following representation for G 


1 p 1 p—l1 P 1 p-2 p 
G (tisto) = G) os G) X Dia + (5) X Diu 
k=1 k<l=1 
1 p-3 Pp 
+ G) ye D3:kim Feit Dp:i,...,p- 
k<l<m=1 

Hence, by (6.18), the computation of P in (6.17) can be achieved by 
successive applications of the Gauss-Hermite quadrature formula using 
only positive Hermite zeros (Abramowitz and Stegun, 1964, page 924). 
There are several advantages for this approach. First, it is not necessary 
to invert the correlation matrix. In addition, (6.19) permits the use of 
Gauss quadrature formula that are remarkably effective in estimating 
the value of an integral from a few points, provided that the integral 
excluding the weighting function can be accurately approximated by a 
polynomial. Moreover, often the integrand separates as a product of two 
functions, one depending only on correlation coefficients and the other 
on the original limits of integration. 

For selected correlation structures and several values of v and y = yz, 
k = 1,...,p, Dutt (1975) computed values of P accurate up to six 
decimal places. 


6.7 Amos’ Probability Integral 


For the equicorrelation structure rj; = p, 7 # j considered by Gupta and 
Sobel (1957) and Gupta (1963) — but with the common p taken to be 
any positive real number less than 1 — Amos (1978) derived the following 
simpler expression for the probability integral 


_ XAT ((v +1)/2) [” _ dz? 
P(d) = Fat Ry? i: exp ( 9 ) 


CE 
x 6?(x)erfc | -—= | dz, 6.20 
ete ( a) oe) 
where erfc(-) is the complementary error function defined by 


erfe(z) = =f exp (—z”) dz 
VT Jz 
and a, b, c, d are constants given by 


Lp 


a= — 


p 


3 


144 Probability Integrals 


hp = OE 
VIZA 

ab 
V1+ 8’ 


a? 


1+62° 


The reduction to (6.20) was obtained by means of a relationship between 
the parabolic cylinder function and the complementary error function. 
Amos (1978) suggested computing the integral (6.20) by locating the 
zo for which the derivative of the integrand is zero and then summing 
quadratures on intervals of length h to the left and right of xp until a 
limit of integration is reached or the truncation error is small enough. 
The motivation for this procedure comes from the fact that zo can vary 
widely with extreme parameter values, and h, which estimates the spread 
of the integrand, can be small or large. Thus, x9 and h accommodate 
the parameters, producing meaningful results by preventing quadratures 
over tails that are negligible or preventing gross misjudgments of the 
scale of integration. Letting g(x) denote the integrand of (6.20), Amos 
(1978) showed that the derivative of log g(x) decreases monotonically 
from oo to —o0 as z traverses (—00, 00), guaranteeing a unique root £o 
of g'(x) = 0. 


6.8 Fujikoshi’s Probability Integrals 


Fujikoshi (1988) provided asymptotic expansions as well as error bounds 
for the probability integral (6.17) when the correlation matrix R = Ip, 
the p x p identity matrix. Specifically, letting 


di F E 
Qô, j (Y1, ----Yp) = dsi {e (s Hy) a (s En 
where 6 = —1, 1, and ® denotes the cdf of the standard normal distri- 


bution, Fujikoshi established the following approximation for the prob- 
ability integral 


, 
s=1 


k-1 
1 

Pith einaey = P (ys) (Up) + D459 (Wa --¥e) 
jail 


a(-y ay 


6.9 Probabilities of Cone 145 


which we shall denote by A;,4(y1,---,Yp). Fujikoshi also derived uni- 
form and nonuniform error bounds for this approximation. Under the 
assumptions that 


ās = supļ|ask (y1,--.,Y¥p)| < 00, 
y 


PCF) fem CG) Jee 


the uniform bound takes the form 


and 


sup |P (41,---Y¥p) — Ase (yr,--+s Yp)l 
y 
2 k 
EG- 
v x2 
G5n(l) = up (1+ Ily II!) lase (y1,---.¥p)| < 00 


(e A 


the nonuniform bound takes the form 


1 
< gE 


Under the assumptions that 


and 


|P (Y1, ---,Yp) — Abe (Y1, ---; Yp) 
1 £ 2\ 1/2 
< gly) ‘auoe ($) x 


k 
+ 


2 
Xv _4 2 
v v 2 
Vv 


J 
Clearly the latter bounds are improvements on the uniform bounds in 
the tail part of the multivariate ¢ distribution. In the case p = 1, these 
results provide useful approximations for the univariate Student’s ¢ dis- 
tribution — see Fujikoshi (1987) and Fujikoshi and Shimizu (1989). The 
special case of (6.21) for y; = y has been investigated more recently by 
Fujikoshi (1988, 1989, 1993), Fujikoshi and Shimizu (1990), and Shimizu 
and Fujikoshi (1997). 


6.9 Probabilities of Cone 


Consider the p-dimensional set 


Al) = {xs zx <r ||z lall zin E(o)}, (6.22) 


146 Probability Integrals 


A,(c) 


E(c) and {Ilzll=r} 


24 


Fig. 6.1. The sets A,(c) and E(c) A {|| z |= r} in two dimensions 


where E(c) = {z : z > c || z ||}, || z |= vzTz, and c is a nonnegative 
constant. The set E(c) is the cone, with vertex at the origin, which 
intersects origin-centered spheres in spherical caps. This is illustrated in 
Figure 6.1 for p = 2. 

Bohrer (1973) studied the analytical shape of A,(c) and the associated 
probability 


pler, p,v) = Pr(X € A,(c)) 


when X has the p-variate t distribution with mean vector 0, covariance 
matrix o7I,, and degrees of freedom v. The evaluation of p(c,r, p, v) is 
of statistical interest and use in the construction of confidence bounds 
(Wynn and Bloomfield, 1971, Section 3; Bohrer and Francis, 1972, equa- 
tion (2.3)) and in testing multivariate hypotheses (Kudô, 1963, Theorem 
3.1, Section 5; Barlow et al., 1972, pages 136ff, 177). 


6.9 Probabilities of Cone 147 


As regards the shape, Bohrer showed that every two-dimensional sec- 
tion of A, containing the z,-axis is exactly the two-dimensional version 
of A, illustrated in Figure 6.1. Thus, A, is the solid of revolution about 
the z,-axis that is swept out by the A, in Figure 6.1. To express this 
more precisely in mathematical terms — for an p x 1 vector v — define 
polar coordinates Ry and py = {6:}, with -r < Oy; < m, by 


vu = Ry, cosy, 


i-1 


v = Ry cosh; If sin ĝyj, 


j=1 
7=2,...,p—1, 
and 
i-1 
vp = Ry] [ sin. 
j=1 
Also define 
6* = arccose, 
Tr = {x: || <0", Re <r}, 
To = {x: bu —6* € (0,2/2], Ry cos (60r — 6*) <r}, 
Tz; = {x: 61 +0* € [—7/2,0), Ry cos (1 + O*) <r}, 
and 
TI, = {x : [bul] > 0" + 7/2}. 


Then the set A, is the union of the disjoint sets T,,...,74. As regards 
evaluating the probability p(c, r, p, v), Bohrer (1973) derived the follow- 
ing expression 


O RØ) r? k (1/2 — 0*) 
plc,r,p,v) = TED (Fe < =) + e/a) 
ae j+1 p-j-1)\ (p-2 
KG 2 4S. 2 ee 


cone 2 
xo! (1- 7)” a (Aaa < =) , 
oO 


148 Probability Integrals 
where k(0) is given by 


2™m!)? in?” 
TEE ( r stehe cos ĝ sin“™ 0 
X sin2(m—) saat jy p-— 2j 

= p-—2l ra al ea 
when p = 2m +1 is odd and by 


(p—1)!9  _ sinP™?—?™ 9 cos0 
22-1 (m — 1)!m! p 


S sin?-!-! A cosð a7 p +1 -2j 


p—2l ja PH 2-25 


k(6) = 


when p = 2m is even. The statistical questions that motivate this work 
ask what radius r is required so that p(c,r,p,v) = a for preassigned 
values of a. For p < 5, Bohrer (1973) provided tables of these percentiles 
for a = 0.95 and 0.99 and for a range of (c, v) pairs. 


6.10 Probabilities of Convex Polyhedra 


It is well known (Nicholson, 1943; Cadwell, 1951; Owen, 1956) that 
probabilities of polygons under bivariate normal distributions can be 
evaluated in terms of probabilities of right-angled triangles with vertices 
(0,0), (yi,0), (yr, y2), yj > 0, j = 1,2 under bivariate normal distri- 
butions with zero correlation. John (1964) proved an analogous result 
that probabilities of polygonal and angular regions for a given bivariate 
t distribution can be expressed in terms of V, (y1, y2), the integral of 


T((v + 2)/2) 


f (1,223) = e2 fi 


r? +r? FER) 
A 


over the right-angled triangles with vertices (0,0), (y1,0), and (y1, y2). 
John (1964) also provided several formulas for evaluating V, (y1, y2). A 
formula in terms of the incomplete beta function is 


1 
Vp (y1,y2) = pP arctan (2) 


në 11 


k v 
J c" By (zrehi) (623) 
An Jv Hy? a 2 22 


6.10 Probabilities of Convex Polyhedra 149 


where 
= v 
vty?’ 
= v+ y? 
v +y ty 
and 


1 
B,(a,b) = [wa wide 


is the incomplete beta function. This series converges slowly unless yı 
is large in relation to v. In the two cases y odd and v even, (6.23) can 
be reduced considerably. If v = 2m for a positive integer m, then 


VIe 11 
Vom (¥1,Yy2) = any ee ye cB, (i + z 5) (6.24) 
k=0 


while if v = 2m + 1 for a nonnegative integer m, then 


1 Yo 1 1 1 
Vom+1 (Yi, Y2) = zp arctan (2) — qP” (G 3 
c(l—c) 1 
+ — Di eB, (k+1,5 } , (6.28) 
k=0 
where 


v (v +y? +y) 
v(v +y? + y3) tyy 


v = 


An attractive feature of (6.24) and (6.25) is that, when utilizing them for 
evaluating Vom and Vom41, they are already evaluated for lower values 
of m also. If one performs the summations in the order indicated in the 
formulas, the addition of each term will yield values of Vom or Vom+1 for 
the next higher value of m. This feature makes it particularly suitable 
for use in preparing tables. 

A second formula for V, (yi, y2) given in John (1964) is an expansion 
in powers of 1/v 

2 © ik 

i » Un (y1, 2) 5 


k=1 


V, (yi, Y2) = Væ (y1,y2) z 


(6.26) 


150 Probability Integrals 


where the first three Ug are given by 


4 
UY, (y1, y2) = “w, (y1, y2) , 


1 y? 
U> (y1 y2) = Bi = zm (yi, Y2) + TUL (y1, y2) \ 
and 
_ 3)3 y? yi 
; = 1 ; ; ; ; 
Us (y1, ya) y 1 (yi, y2) -4 (yi, Y2) + 64 (y1: Y2) 


where 


y2/Y1 A y?t? 
W, (m,y2) = I (1+) exp (-4 ) dt. 
0 


The term Væ in (6.26) is the integral of exp{—(y? + y3)}/(27) over the 
right-angled triangle with vertices (0,0), (y1,0), and (0, y2). The method 
of derivation for (6.26) is similar to the classical method employed by 
Fisher (1925) for expanding the probability integral of Student’s t. De- 
spite the complexity of (6.26) over (6.23), (6.26) should be preferred if 
v is sufficiently large. The first two or three terms of (6.26) then can be 
expected to provide fairly accurate values of V,. 

John (1964) also provided a recurrence relation and an approximation 
for V,(y1, y2); the latter proved to be satisfactory only when either v is 
too small or y2/yi is too large. In a subsequent paper, John (1966) 
extended this result to higher dimensions, by showing that the probabil- 
ities of the p-dimensional convex polyhedra with vertices (0, 0, 0, 0,..., 
0), (Y1, 0, 0, 0,..., 0), (yı, y2, 0, 0,..., 0), gs (yi, Y2, Y3, Yay 5 Yp)» 
hj > 0, 7 =1,2,...,p under a p-variate ¢ distribution with v degrees of 
freedom can be expressed in terms of the function V,(y1, Y2,- --, Yp), the 
integral of the p-variate t pdf 


PG eis Boe) 
T((v + p)/2) r? + a3 +--+ +22 PENA 
(vr)P/T(v/2) v 


over the same p-dimensional convex polyhedra. John also provided an 
important asymptotic expansion in powers of 1/v connecting V,(y1, Y2, 
-< Yp) with V(y1, Y2,-.-,Yp), the integral of the p-variate normal pdf 


f (21, %2,...,%p300) = (27)-?/? exp {— (xt +23 + 423) /2} 


6.10 Probabilities of Convex Polyhedra 151 


over the same polyhedra discussed above. Up to the order of the term 

O(1/v?), the expansion is 
V, (yi, Ye; KE Up) 

1 

4v 

-=y (1 +y?) f(y) V (Y2, ¥s;--+4p) } 


= V (Yi, Y2,- --,Yp) + {uiyaf (142) V (Ys, Yas: -s Yp) 


1 
+ga [3nvvsyaf (u Y2y3: y4) V (Y5, Y6,- -- Yp) 


—yiyoys (2 + 9y? + 6ys + 3y3) f (y1: Y2,y3) V (Ya, -- +5 Yp) 
—yiye (3 + 5y7 +y3 — 9yí — 9y7ys — 3v2) 

XV (y3,---5¥p) f (Y1:Y2) 
+n (3+ 5y? + Tuf — 3y) F (1) V (zs. } 


+0(Z). 


In this formula, V(Ym;Y¥m+1,---Yp) is to be replaced by 0 if m > p+2 
and by 1 if m = p+1. In principle, there is no difficulty in determining 
further terms of this expansion, but the coefficients of higher powers of 
(1/v) have rather complicated expressions. Other useful results given 
by John (1966) include recursion formulas connecting V, (41, y2, ---5Yp) 
with Vi42(1, 92, see Yp). 

More recently, several authors have looked into the problem of com- 
puting multivariate ¢ probabilities of the form 


P = [1na (6.27) 


where X has the central multivariate t distribution with correlation ma- 
trix R and A is any convex region. Somerville (1993a, 1993b, 1993c, 
1994) developed the first known procedures for evaluating P in (6.27). 
Let MM!’ be the Cholesky decomposition of R (where M is a lower 
triangular matrix) and set X = MW. Then W is multivariate t with 
correlation matrix Ip. If one further sets r? = WTW, then F = r?/p 
has the well known F distribution with degrees of freedom p and v. Let 
A be the region bounded by p hyperplanes and described by 


GW <d, 


where G = (gi,...,gp) and the jth hyperplane is g; W = dj. Fora 
random direction c, let r be the distance from the origin to the boundary 


152 Probability Integrals 


of A, that is, the smallest positive distance from the origin to the jth 
plane, j = 1,...,p. Then an unbiased estimate of the integral P in 
(6.27) is 


Pr (F <r?/p). (6.28) 


To implement the procedure, Somerville chose successive random direc- 
tions c and obtained corresponding estimates of (6.28). The value of P 
was then taken as the arithmetic mean of the individual estimates. 

Somerville (1997, 1998b) provided the following modification of the 
above procedure. Let r* be the minimum distance from the origin to 
the boundary of A, that is, the smallest of the r for all random directions 
c. Divide A into two regions, the portion inside the hypersphere of radius 
r* and centered at the origin, and the region outside. The probability 
content of the hypersphere is 


P = Pr (F <r*?/p), 


and this can be estimated as in Somerville (1993a, 1993b, 1993c, 1994). 
If E(v) and e(v), respectively, denote the cdf and the pdf of v = 1/r 
(the reciprocal distance from the origin from and to the boundary of A), 
then the probability content of the outer region is 


1/r* 
P, = f E(v)e(v)dv. 
0 
Since F = r?/p, the pdf of v is 


aT ((v+p)/2)_—_ v=? 
[P(v/2)P(p/2) (1 + vy2) t 


The strategy is to use some numerical method to estimate E(v) and then 
evaluate the integral P) using the Gauss-Legendre quadrature. The ap- 
proaches of Somerville (1997, 1998b) differ in that Somerville (1997) ap- 
plied Monte Carlo techniques to estimate E(v) while Somerville (1998b) 
used a binning procedure. It should be noted, however, that an approach 
similar to these had been introduced earlier by Deak (1990). 
Somerville (1999a) provided an extension of the above methodologies 
to evaluate P in (6.27) when A is an ellipsoidal region. This has po- 
tential applications in the field of reliability (in particular relating to 
the computation of the tolerance factor for multivariate normal popula- 
tions) and to the calculation of probabilities for linear combinations of 
central and noncentral chi-squared and F. In the coordinate system of 
the transformed variables W, assume, without loss of generality, that 


ev) = 


6.10 Probabilities of Convex Polyhedra 153 


the axes of the ellipsoid are parallel to the coordinate axes and the ellip- 
soid has the equation (w — u)? B~!(w — u) = 1, where B is a diagonal 
matrix with the ith element given by b;. If the ellipsoid contains the 
origin, then for each random direction c there is a unique distance r to 
the boundary. An unbiased estimate of P is then given by 


Pr (F <r?/p). 


If the ellipsoid does not contain the origin, then, for a random direction, 
a line from the origin in that direction either intersects the boundary of 
the ellipsoid at two points (say r > rą) or does not intersect it at all. If 
the line intersects the boundary, then an unbiased estimate of P is given 
by the difference 


Pr (F < r?/p) — Pr (F < r?/p). 


If the line does not intersect the ellipsoid, an unbiased estimate is 0. 
As in the first procedure described above, this is repeated for successive 
random directions c, each providing an unbiased estimate. The value 
of P is then taken as the arithmetic average. A modification of this 
procedure along the lines of Somerville (1997, 1998b) is described in 
Somerville (1999a). 

Somerville (1999b) provided an application of his methods for multiple 
testing and comparisons by taking A in (6.27) to be 


A = {x € RP :maxe?x < q/v3}, ce B, 


where B is the set of contrasts corresponding to the different hypotheses 
and q > 0. The purpose is to calculate the value of q for arbitrary 
R and v and arbitrary sets B such that the probability content of A 
has a preassigned value y. Somerville and Bretz (2001) have written 
two Fortran 90 programs (QBATCH4.FOR and QINTER4.FOR) and 
two SAS-IML programs (QBATCH4.SAS and QINTER4.SAS) for this 
purpose. QINTER4.FOR and QINTERA.SAS are interactive programs, 
while the other two are batch programs. A compiled version of the 
Fortran 90 programs that should run on any PC with Windows 95 or 
later can be found at 


http: //pegasus.cc.ucf.edu/~somervil/home. html 


These programs implement the methodology described above to evaluate 
the probability content of A (A Fortran 90 programs MVI3.FOR used to 
evaluate multivariate t integrals over any convex region is described in 


154 Probability Integrals 


Somerville (1998a). An extended Fortran 90 programs MVELPS.FOR to 
evaluate multivariate t integrals over any ellipsoidal regions is described 
in Somerville (2001). The average running times for the latter program 
range from 0.075 and 0.109 second for p = 2 and 3, respectively, to 
0.379 and 0.843 second for p = 10 and 20, respectively.). The so-called 
“Brent’s method,” an interactive procedure described in Press (1986), is 
used to solve for the value of g. The time to estimate the q values (with 
a standard error of 0.01) using QINTER4 or QBATCH4 range from 10 
seconds for Dunnett’s multiple comparisons procedure to 52 seconds for 
Tukey’s procedure, using a 486-33 processor. 

A problem that frequently arises in statistical analysis is to compute 
(6.27) when A is a rectangular region, that is, 


b pbz bp 
P = J / -f f (21,22, ...,£p)d£p' + drzdzı. (6.29) 
ay a2 ap 


Wang and Kennedy (1997) employed numerical interval analysis to com- 
pute P. The method is similar to the approaches of Corliss and Rall 
(1987) for univariate normal probabilities and Wang and Kennedy (1990) 
for bivariate normal probabilities. The basic idea is to apply the mul- 
tivariate Taylor expansion to the joint pdf f. Letting c; = (a; + b;)/2, 


the Taylor expansion of f at the mid point (ci, ¢2,...,¢p) is 
f (21, 22, Zp) 
-1 
2% > 1 OF f (c1,€2,...5€p) JI í; e" 
k=0 ( Jal kil- -kp! Oat Ont? -Oep j=l = 


Is. OU E nestled) pts. 
* maim aoa oat LI C cj) ; 


(6.30) 


where €; is contained in the integration region [a;,b;] and ]k[ denotes 
all possible partitions of k into p parts. For example, in the case p = 3, 
]2[ will result in 6 possible partitions of ‘2’ into {kı, k2, k3}: {0,0,2}, 
{0,1,1}, {0,2,0}, {1,0,1}, {1,1,0}, and {2,0,0}. The main problem 
with computing (6.30) is the presence of high-order partial derivatives 
of f. Defining 


1 kitkzt+kp f 


ky!ko! +++ kp! Ox" Ark? -Oxir 


(Fer ko---kp (6.31) 


6.10 Probabilities of Convex Polyhedra 155 


Wang and Kennedy derived the following recursive formula 


1 xTR-!x\ + 
(fkrko--ky = ee (=>) 


( =E) 
x ae . 
u kı—lı,k2—l2,...,kp—lp 
With regard to the last quadratic term, it should be noted that higher 


than second-order partial derivatives are all zero. To carry out the com- 
putation of (6.31) for a given (kı, k2,..-, kp), one can 


e first let one 1; be k; — 1 (if this k; # 1) and all the other /;’s be their 
corresponding k;’s; 

e next let l, and l, be kp — 1 and ks — 1, respectively (if k, # 1 and 
ks # 1), while all the other l;’s take their corresponding k;’s; 

e finally, let some lj be kj — 2 (if k; > 2) and all other l;’s be the 
corresponding k;’s. 


The total number of terms that contribute to computing (f)kıkz--kp IS at 
most p(p+3)/2. Compared to the multivariate normal distribution, this 
number is larger (Wang and Kennedy, 1990). The following table gives 
the running times and the accuracy for computing (6.29) with v = 10. 


Running time and accuracy for computing P in (6.29) 


p Ruming aj =—0.5 aj=-—0.4 a;=-0.3 a; = —0.2 
time (min) bj=0.5 bj=04 6)=03 0b; =0.2 


10 80 2 sig 4 sig 
9 70 3 sig 7 sig 
8 85 0 sig 5 sig 10 sig 
7 90 3 sig 8 sig 

6 110 3 sig 8 sig 

5 180 10 sig 


Another point to note about Wang and Kennedy’s method is that when 
the integration region is near the origin it works better for larger v, while 


156 Probability Integrals 


when the integration region is off the origin it works better for smaller 
Vy. 

The main problem with Wang and Kennedy’s (1997) method is that, 
the calculation times required are too large even for low accuracy results 
(see the table above). Genz and Bretz (1999) proposed a new method for 
computing (6.29) by transforming the p-variate integrand into a product, 
of univariate integrands. The method is similar to the one used by Genz 
(1992) for the multivariate normal integral. 

Letting MM’ be the Cholesky decomposition of R., define the follow- 
ing transformations 


p 
se Mj kYk, 
k=1 


v+ Y? 
EN peg 


U; = Tr45-1 (Z3), 
and 
Z; = 4; +W; (e;—-4;), 


where T, denotes the cdf of the univariate Student’s t distribution with 
degrees of freedom 7, 


d; = T,4j-1 (4), 


ej = Tr4j-1 (i) 
rn ees ae 
a= a; ; 

y+ yr 1 Vi 
: aay 
bj = v Senla 

v+) i= ive 


; Vee eee 
1 aj — dopa Mj,k Yk 
Mij 


6.10 Probabilities of Convex Polyhedra 157 


and 


Y bj — Dii Mp RYe 

3,5 
Applying the above transformations successively, Genz and Bretz re- 
duced (6.29) to 


P = (edi) f (e-d) f en-do) [aw (6.32) 
[ [of ioa (6.33) 


The transformation has the effect of flattening the surface of the original 
function, and P becomes an integral of f(w) = (e1 — di) +: (ep — dp) 
over the (p — 1)-dimensional unit hypercube. Hence, one has improved 
numerical tractability and (6.33) can be evaluated with different multidi- 
mensional numerical computation methods. Genz and Bretz considered 
three numerical algorithms for this: an acceptance-rejection sampling 
algorithm, a crude Monte Carlo algorithm, and a lattice rule algorithm. 


e Acceptance-rejection sampling algorithm: Generate p-dimensional uni- 
form random vectors w,,W2,...,Wy and estimate P by 


‘ pA 
= W ŽA (My,), 
l=1 
where 
h (x) = 1 if aj < £j < bj, j = 1,2,...,p, 
0 otherwise 
and 


— pal : vt+ Dw LV 
uj = Tyyy-1 (Wj) errors ig v+j—1l 


j=1,2,...,p, 1=1,2,...,N 


e A crude Monte Carlo algorithm: Generate (p—1)-dimensional uniform 
random vectors w1, W2,..., Wy and estimate P by 


1 N 
y2 fw), 
{=i 


an unbiased estimator of the integral (6.33). 


158 Probability Integrals 


e A lattice rule algorithm (Joe, 1990; Sloan and Joe, 1994): Gener- 
ate (p — 1)-dimensional uniform random vectors w1, W2,..., Wy and 
estimate P by 


Here N is the simulation size, usually very small, q corresponds to the 
fineness of the lattice, and z € RPT! denotes a strategically chosen 
lattice vector. Braces around vectors indicate that each component 
has to be replaced by its fractional part. One possible choice of z 
follows the good lattice points; see, for example, Sloan and Joe (1994). 


For all three algorithms — to control the simulated error - one may 
use the usual error estimate of the means. Perhaps the most intuitive 
one of the three is the acceptance-rejection method. However, Deak 
(1990) showed that, among various methods, it is the one with the worst 
efficiency. Genz and Bretz (2001) proposed the use of the lattice rule 
algorithm. Bretz et al. (2001) provided an application of this algorithm 
for multiple comparison procedures. 

The method of Genz and Bretz (1999) described above also includes 
an efficient evaluation of probabilities of the form 


b 
P = f a(x) f(x)dx, 


where g(x) is some nuisance function. Fortran and SAS-IML codes to 
implement the method for p < 100 are available from the Web sites with 
URLs 


http://www.bioinf .uni-hannover.de/~betz/ 
and 


and http://www.sci.wsu.edu/math/faculty/genz/homepage. 


6.11 Probabilities of Linear Inequalities 


Let X be a random variable characterizing the “load,” and let Y be a 
random variable determining the “strength” of a component. Then the 
probability that a system is “trouble-free” is Pr(Y > X). In a more 
complicated situation, the operation of the system may depend on a 


6.11 Probabilities of Linear Inequalities 159 


linear combination of random vectors, say af X, + af X, + b, and the 
probability of a trouble-free operation will be 


Pr (aX; +a, X2+5>0), (6.34) 


where X; are independent k;-dimensional random vectors, a; are kj- 
dimensional constant vectors, and b is a scalar constant. Absusev and 
Kolegova (2001) studied the problem of constructing unbiased, maxi- 
mum likelihood, and Bayesian estimators of the probability (6.34) when 
X; is assumed to have the multivariate t distribution with mean vector 
uj and correlation matrix Rj. If x11,...,Xin, and X21,...,Xanz are iid 
samples from the two multivariate ¢ distributions, then — in the where 
case both p; and Rj are unknown — it was established that the unbiased 
and the maximum likelihood estimators are 


P(ni/2)T aa 
aD ((ny — ee ((n2 — 1)/2) 


xf TI +) hea a ? dindin 
Qı j= | 


Pr (af Xi + al X, +b> 0) = 


and 
Ts Ts 
j a a X b 
Pr (ai Xi +a? X: +b > 0) = © pa ac t0 
ai Sni4181 + a Sn.+142 


respectively, where 


Q = jo <niana 
2 
5 vjq/nja? Snj41aj + 5 al x; +b> o}, 
j=1 j=l 
E 1S 
Xn; = ao) 5 Xm, 
ny m=1 
ntl 
(nj+1)x; = 5 Xjm, 
m=l 
nj+1 
_ LAP 
(ni +1) Sajptt = J (jm — 3i) (Xjm — Ks) 


160 Probability Integrals 


and Xn,41 = x. A Bayesian estimator of (6.34) with unknown pa- 
rameters yt; and R; and the Lebesgue measure p(@)d@ = dudR was 
calculated to be 


2 ) 12) nk 
Prg (aj Xı + ay X2 +b > 0) = lee 
jaa TT ((nj — 1)/2) (nj + 1)” 


(nj — kj — 1)/2) 
SIE, = Dp) 


nj-3 


a Tl 1 — z?) ae dz,dz2, 
Q 


2 j=l 


2 2 
Qe = z? < l,j = 1, 2, Soz njal Snj41aj + X al x; +b>0 
j=l j=l 


This Bayesian estimator is biased and is related to the unbiased estima- 
tor via the relation 


Prg (afX;+a3X2+b>0) = APr(a7Xı +a X: +b> 0), 
where 
s nj — kj — 1)/2)T ((n; = k)/2) (nj +) 
jel T (nj — 2k; — 1)/2)T (n;/2) ni? 


The coefficient A can be expanded as 


k k 1 
7 = ts hs eee ener ee 
a Sk (=) 


where n = max(ni,n2) and k = max(ki,k2). Therefore, the Bayesian 
estimator is asymptotically unbiased as n —- oo. 

Substantial literature is now available on problems concerning proba- 
bilities of the form (6.34) for various distributions. For a comprehensive 
and up-to-date summary, the reader is referred to Kotz et al. (2003). 


6.12 Maximum Probability Content 


Let X be a bivariate random vector with the joint pdf of the form 


f(x) = g (œ= WTR x- u)), (6.35) 


6.13 Monte Carlo Evaluation 161 


which, of course, includes the bivariate t pdf. Consider the class of 
rectangles 


Ria) = {(#1, 22): |z1| < a, |z2| < A/(4a)} 


with the area equal to À. Kunte and Rattihalli (1984) studied the prob- 
lem of characterizing the region R in this class for which the probability 
P(R(a)) = Pr(X € R(a)) is maximum. As noted in Rattihalli (1981), 
the characterizations of such regions is useful for obtaining Bayes re- 
gional estimators when (i) the decision space is the class of rectangular 
regions and (ii) the loss function is a linear combination of the area of 
the region and the indicator of the noncoverage of the region. It was 
shown that, for any fixed à > 0, the maximal set is 


{(21, £2) : |t1 — pn] < ¢,|22 — pal < A/(Ac)}, 
where c is given by 
à fr? 
IVa 
Here, r denotes the (i,7)th element of the inverse of R. In particular, 


if u = 0, r?? =r?! = p and | p |< 1 in (6.35), then P(R(a)) is increasing 
for a < VX/2 and is decreasing for a > VX/2. 


c 


6.13 Monte Carlo Evaluation 


Let X be a central p-variate t random vector with correlation matrix R, 
and degrees of freedom v. Vijverberg (1996) developed a family of simu- 
lators of the multivariate t probability p = Pr(X < Xo) based on Monte 
Carlo simulation and recursive importance sampling. We shall provide 
the basic steps of this rather complicated but powerful procedure. 

Define Z = AX, where A is an upper triangular matrix such that 
ATA = R. Then it is well known that the pdf of Z can be expressed as 
a product of univariate Student’s t pdfs 


fa) = [JA (0v), 


k=1 


where 


162 Probability Integrals 


and 


fi (2307, v) I ((v + 1)/2) | 1 a —(v+1)/2 | 


VroI (v/2) 
We shall denote by F\(x;07,v) the cdf corresponding to fı. For conve- 
nience denote A`? = B = bij, where B is an upper triangular matrix 
with bpp = 1 and bj; > 0 for all j. Then, since the integral over X covers 
the region X < Xo, the integral over Z is determined by the inequality 
BZ < Xo, and the bounds can be written as 


v o? 


Zp < £0 


= Zno 
and 
p 
Zk < ba (21 - D wai) 
i=k+1 
= Zko (Leo, Zk+1; -> -3 Zp) 


for k = 1,2,...,p— 1. Utilizing this transformation, the probability p 
can be written as p = Jp, where 


Yko 


Jk = / fi (243 0%, Ve) er dex (6.36) 


—oo 


ZkO 
1 F (2ko; OR, Vk) Jk- fi (zk; 0%, ve) dz, 


— 00 


Eşe [F (2ko; Ths Ve) Jk-1] , k=2,3,...,p, 


where 


fi (za3 02, Ve) 


A ee = Taa 
7: , 


is the univariate unconditional t pdf for z4 < zgo and Jı = F} (210; 02, 11). 
Hence, Jj, is the probability over the range of (z),..., zg) conditional on 
the values for (zg41,---; Zp). 

The Monte Carlo simulation starts off by drawing random values of 
zp from the distribution ff(-; zp0,03,%p), which we shall denote by Zp,r, 
r = 1,...,R. Each of these yields a different bound Zp-1,0,r and pa- 
rameter value 6?_, „ for each draw of zp-1; Zp—1,r is then drawn from 
the distribution ff(-; Z)-1,0,r,05-1,r.4p-1)- This process continues until 
Žə r is drawn and J = Fy (410,73 7,751) is computed with a commonly 


6.18 Monte Carlo Evaluation 163 


available approximation routine for the univariate Student’s ¢ cdf. The 
simulated estimate of p is then found as the sample average of the Jp 
values across the simulated sample of R elements 


R 
: s 1 ” i s 
p= J= 3 yA (20363 Č, Vp) Spats 
r=1 
where J, = Fy (20,03 Cees Uy) Je for k = 2,...,p—1. It is more efficient 
to estimate J, by averaging over a large number of elements than to 
obtain close approximations of its components J; for k < p. Therefore, 
a better estimate for p is 


E (2 
p= ao {A Gantan) }. 
r=1 \k=1 


The right-hand side of (6.36) remains unchanged if the integrand is 
divided or multiplied by any nonzero function of z. Let gp be a p- 
dimensional pdf such that 

p 
gplziv) = |] a (arè), 
k=1 
where gı is a univariate pdf of a type to be mentioned below with 
Var(zk) = TÈ = o?vk/(Vk — 2), and o? and vg are as defined above. 


Let G1 (2k; T) be the associated cdf, and let 
gı (zr; T) 

© (243 Zk0, TA = 

Oi (Z; 240, Tk) Gi (2ko; TÈ) 


be the conditional pdf. Finally, let 


p 
CEON SA c A 2 
Gp (2; 20, V) = IL 9 (Zk; 20, Th) - 
k=1 
With these definitions, one can write p = Jp in terms of 
Zk0 f (z -o2 v ) 
1k: UR Yk 2 
JI = a ea (zk; 240, TR) dzk 
~oo Gf (Zk; Zk0, Th) 


fi (2k; 0}, Vk 
Ege | ı( k ) 


Gf (2k; 2ko, TP) 


sa|; k=2,...,p 


and Jı = F(z19;07,™). Clearly, Jp, and, more particularly, gj, is an 
important sampling density (see, for example, Hammersley and Hand- 
scomb, 1964). To evaluate p, the procedure is as follows: Generate ran- 
dom drawings Zp, for r = 1,...,R from the distribution g¢(-; zno, 72); 


164 Probability Integrals 


compute the implied values Zp_1,0,r and 72_,,, for each drawing of zp_1; 
draw Zp—1,r from the distribution gf(-;Zp—1,0,r,7—1,r)¥p); and continue 
on until 229, is drawn and Jı is computed. Based on this procedure, p 


may be written in the form 


ss ~2 
1 R 2 p fi (Zis erie 92... Ye) 
P = R x F (210,73 Tir) I ak a5 
r=1 k=1 g$ (Ze. ZkO,r> z) 
Three suitable choices for the importance density function are 


e the logit with 


À 
= ~¢(1—q), 

gı (z) 7a q) 

where 
q = [1+exp (—=Az/r)] 
and À = T/vV3; 
e transformed beta (2, 2) density (Vijverberg, 1995) with 

q(x) = 627(1—2)?, 

where 


exp (z/c) 
1+ exp (z/o) 


e the normal N(0, o°) density. 


Vijverberg (1997, 2000) has developed a new family of simulators that 
extends the above research on the simulation of high-order probabilities. 
For instance, Vijverberg (2000) has reported that the gain in precision 
using the new family translates into a 40% savings in computational 
time. 


7 
Probability Inequalities 


Probability inequalities on Pr(Y¥i < y1, Y2 < y2,--.,¥p < Yp) for multi- 
variate distributions have been a popular topic of investigation since the 
1950s. It is well known (Khatri, 1967; Scott, 1967; Šidák, 1965, 1967) 
that, for arbitrary positive numbers y1, y2,.-.,Yp, the inequality 


Pr ([¥i| < y1, [Yo] < y2,---s1¥pl < Yp) > TL Pr [Yel < ye) 


holds for any random vector YT = (Y1, Y2,..., Yp) having the multivari- 
ate normal distribution with zero means and an arbitrary correlation 
matrix. A question then arises as to whether there is an analog of this 
for multivariate t distributions. 


7.1 Dunnett and Sobel’s Probability Inequalities 
Dunnett and Sobel (1955) obtained bounds for the probability integral 


P = Pr(Xı < z1, X2 < T2,..., Xp < Lp) 


zı > 0, z2 > 0, ..., £p > 0, when (X1,X2,..., Xp) follows the central 
p-variate ¢ distribution with degrees of freedom v and the correlation 
matrix R. taking the special structure rj; = b;b; for all i # j. Using the 
definition that X can be represented as (2) /S, Z2/S,...,Zp/S), where 
Z is a p-variate normal random vector with correlation matrix R and 
vS?/o? is an independent chi-squared random variable with degrees of 
freedom v, one can rewrite P as 


is Z 22 Le 
P = Prf cane T2 T ry} 


165 


166 Probability Inequalities 
f G (£18, £28, . - - , pS) h(s)ds, (7.1) 
0 


where G is the joint cdf of Z and h is the pdf of S. If Yo, Yi, Yo,..., Yp are 
independent standard normal random variables, then one can represent 


Zj = 4/1 —b;Y; — bjYo for j = 1,2,...,p. Using this result, one can 


rewrite G as 


G(@18,%28,...,%)8) = Pr{ 1 — bY; — bj Yo saas 1, 2c2;9} 


Il | 
es) Co 
a & 
i i =~ 
S 2 
8 
S < & 
èn 5 
= DN & 
IIE T 
Sb a DS D 
R 
<—— ~v 
= 
x 
R 
= 
XY 


where ¢ and ® are, respectively, the pdf and the cdf of the standard 
normal distribution. Using the well known inequality 


Pitino} > []#{K(o)} (7.3) 
j=l j=l 


(where F} denotes a cdf), one can now bound G by 


G (£18, £28, ..- , Ep3) > H T g 
7 
Iie 


Vv 
id 


Substituting this result into (7.1) and applying (7.3) once more, one 
obtains the lower bound for P given by Dunnett and Sobel (1955) as 


A IG (x;s) f(s)ds 
I [E sitoa 


P 


IV 


\V 


7.1 Dunnett and Sobel’s Probability Inequalities 167 


Pp 
II Pr {Z; < x58} 


j=l 


= JJe: {X; < z;}. (7.4) 


j=1 


This lower bound for P holds more generally for any correlation matrix 
R with ri; > 0 and any arbitrarily fixed (z1,..., £p). This is a conse- 
quence of the fact that P is an increasing function of each rj; for all 
i Æ j, while other correlations are held fixed. It can be shown further 
that 


PE (Xi > £1,..., Xp > Vp) Pr (X; > zj) 


V 
vam E 


1 


». 
Il 


and 


1s 


Pr(\Xi] Smy---5|Xpl >a) 2 [[Pr 0X > 25). 


1 


&. 
il 


Since the bound (7.4) does not depend on rij, it can be calculated eas- 
ily from a table of the cdf of the univariate Student’s ¢ distribution. 
Dunnett and Sobel (1955) also obtained two sharper bounds by slight 
modifications of the above arguments: For even p > 2, 


p/2 
P > I Pr {X25~1 < T2j—1, X25 < Tj}, (7.5) 
j=1 
and for odd p > 3, 
(p—1)/2 
P > Pr {X < x1} Il Pr {Xo; < Lj, X2j+1 < ©2541 } : 
j=l 


(7.6) 


In the case where rj; = p for all i # j and z; = z for all j, inequalities 
that are sharper than (7.5) and (7.6) can be obtained. Let 


Ai(p) = Pr(X, <d,X2 <d,...,Xp < d) 
and 
Bo(p) = Pr(|Xi| <d,|Xe2| <d,...,|Xp| < d). 


It is well known that (p), k = 1,2 are monotonically increasing in 


168 Probability Inequalities 
Tij (i Æ j) and, if Tij > 0, then 
Pep) 2 i),  k=1,2 (7.7) 


Tong (1970) provided the following sharper bounds for f, 


Bu(p) > {Bx (m)}"!™ > BP(1) + {Be(2) — a2) PP”, (7.8) 


where p > m > 2. These inequalities certainly improve on (7.7), but 
neither of them are very sharp when p is large. Also observe that. the 
first inequality in (7.8) depends on p and m only through their ratio 
p/m. Hence, for fairly large p and m as long as p/m is close to 1 (even if 
the difference p — m is not small), the first inequality is quite adequate. 
If p = 0, then a necessary and sufficient condition for Bx(p) > BR (1) for 
every fixed p is that v — oo. 

Recently Seneta (1993) pointed out that the “sub-Markov” inequality 


Bip) > [Pr{X: <z, X < £} P /[Pr{X1 < rH? (7.9) 
is sharper than the corresponding inequality 
Bilp) > [Pr{X: < z, X2 < s}P” (7.10) 


as given by (7.8). This fact is illustrated in the following table, which is 
taken from Seneta (1993). 


Comparison of the bounds (7.10) and (7.9) for P. 
x chosen such that the true value of P = 0.95 


A Bound (7.10) Bound (7.9) 


10 2.34 0.945 0.946 
p=3 15 224 0.946 0.947 
20 2.19 0.945 0.946 
60 2.10 0.944 0.945 
10 2.81 0.921 0.921 
p=9 15 267 0.924 0.924 
20 2.60 0.926 0.927 


60 2.48 0.934 0.936 


7.2 Dunn’s Probability Inequalities 169 
Actually, (7.9) is a particular case of the following inequalities 


Alp) > Pr(X;<2;j=1,...,m—-1) 
SAPP Xe < £| Xj <gj=1,... m= 1) 
(7.11) 


and 


x {Pr (Xml < 2 | [Xj] < zj = 1, m- PTT 


given by Glaz and Johnson (1984), who also provided a formal proof of 
the fact that (7.11) is sharper than (7.8). 

Dunnett (1989) wrote a Fortran programs for evaluating the integral 
(7.2). It uses Simpson’s rule to compute an approximation to (7.2) in 
such a way that a prescribed accuracy is achieved. To approximate the 
integral of a function a(z), say, over an interval [a,b] using Simpson’s 
rule, the value of the function is computed at the two end points and at 
the midpoint of the interval; then the approximate value of the integral 
is given by 


with its error bounded by a4 (a, 6)(b—a)* /2880, where a4(a, b) is a bound 
on the absolute value of the fourth derivative of a(z) over the interval 
(see, for example, page 66 in Shampine and Allen, 1973). The central 
processor unit time (on a VAX 8600 computer using single-precision 
arithmetic) taken to compute (7.2) ranges from 0.01 to 2.37 seconds for 
cases of equal correlation (ri; = p) and identical ranges of integration 
(x; = x). Slightly longer computing times are required for unequal corre- 
lations or different limits of integration. Dunnett (1989) suggested that 
his program can be used along with an appropriate numerical integra- 
tion routine, such as the Integral Mathematical and Statistical Libraries’ 
(1987, Volume 1, Chapter 4) QDAGIJ, to evaluate the multivariate t 
probability integral (7.1). 


7.2 Dunn’s Probability Inequalities 


The univariate Student’s t distribution has the property that the prob- 
ability evaluated from —z to +z is an increasing function of the degrees 
of freedom v — this also applies to the probability from —oo to +g (see, 


170 Probability Inequalities 


for example, Ghosh, 1973, for details). Dunn (1965) pointed out that 
this monotonicity does not generalize to p dimensions in the usual mul- 
tivariate ¢ distribution. Specifically, let X have the central p-variate t 
distribution with degrees of freedom v and correlation matrix R. If F(z) 
is defined by 


F(z) = Pr{nfi.,-c< X<}, 


then F(z) equals the probability mass in the multivariate ¢ distribution 
evaluated over a p-dimensional hypercube centered at the origin of the 
half side z and F(z) is the distribution of the maximum of the absolute 
values of the p X variables. Similarly, if G(x) is defined by 


G(z) = Pr{n_ -0< X<}, 


then G(x) equals the probability mass evaluated from —oo to x in each 
direction and G(x) is the distribution of the maximum of the p X vari- 
ables. Dunn showed that, for any given x > 0 and degrees of freedom 
vı > v2, there exists an integer K such that, for all p > K, 


Fv, (z) < F, V2 (z) 
and 
Gp,v (x) < Gp». (z). 


Here, Fp,» and Gp,» are F and G as defined above, with dimension p and 
degrees of freedom v. This result covers the case of all correlations equal 
to 0. When all correlations are equal to 1, the distribution is the same as 
the univariate Student’s t distribution, so that, for all dimensions, F(z) 
and G(x) are monotonically increasing functions of v. Other correlation 
matrices may be considered in some sense to lie between these two ex- 
tremes. In various unpublished tables of F(x), the change is found to 
occur at a dimension where F(z) is approximately 0.25 or 0.30. 


7.3 Halperin’s Probability Inequalities 


Halperin (1967) extended the inequality (7.4) for generalized bivariate 
t distributions as follows. Let (Yi, Yie), ¢ = 0,1,2,...,r, r > 1 be 
independent samples from a bivariate normal distribution with zero 
means, variances oĉ, 0%, and covariances 002i, | pi |< 1. Let Yi, 
i=r+1,...,r +n and Yo,i=rtl,...,r+m be independent nor- 
mal samples with zero means and variances o? and o2, respectively, and 


7.4 Šidák’s Probability Inequalities 171 
independent of (Yz, Yi2), i = 0,1,2,...,r. Define 


Yio Yoo 
Xı, X = —— 
( l; 2) (ž ’ So 
where 
1 rtn 
a y2 
Sı r+n 2 il 
and 
1 r+m 
= y2 
Se r+m 3 i2 


Halperin (1967) then showed that the probability integral of (X, X2) 
satisfies the inequality 


Pr (|X| < z1, |X2| < a2) > Pr(|Xi| < 21) Pr (|X2| < z2) 


for all real numbers zı and z2. 


7.4 Šidák’s Probability Inequalities 

In the bivariate case considered above, it is assumed that the correlation 
between Y; and Y;2 may be different for different i’s. For a general p, 
but for a special correlation structure of Y’s, Šidák (1967) established 
the following result. Let YT = (Y1, Yo,..., Yp) have a p-variate normal 
distribution with zero means and an arbitrary correlation matrix. Let 
ZT = (Za, Zi2,...,Zip), i = 1,...,n be a pvariate normal random 
sample, which is mutually independent and independent of Y, each of 
which has zero means, unit variances, and the decomposable correlation 
structure given by 


Corr (Zki, Zkj) = bib; 


for i,j =1,....p i # j; k =1,...,n withO <b; < 1, i =1,...,p. 
Then, 


Pr (athe Lti 


V Gi to + Za Zeit + Ze, 


> ee aes | eer : (7.12) 
= VA tee + Za 


172 Probability Inequalities 


Essentially the same result, assuming more generally only | b; |< 1, 
follows by an easy specialization of Corollary 8 in Khatri (1967). 

A general proof of the inequality (7.12) under the assumption that 
Y and all Z;’s have the same normal distribution with zero means and 
an arbitrary covariance matrix was provided by Scott (1967). Unfortu- 
nately, this proof is correct only for p = 2, and Šidák (1971) produced a 
counterexample showing its incorrectness for p > 2. Siddk (1971) went 
on to show that, if 


Corr (Yi, Yj) = cieyri 


for i,j = 1,2,...,p; i AJ with | cz |< 1 (j =1,2,...,&) and {riz} any 
fixed correlation matrix, and if 


Corr (Zii, Ziz) = bubij 


for i,j = 1,2,..., p; i # j; l= 1,2,...,n with | bu |< 1 (i = 1,2,...,p, 
l = 1,2,...,n), then the left-hand side probability in (7.12) as a function 
of cj is nonincreasing for —1 < c; < 0 and nondecreasing for 0 < cj < 
1, so that it has a minimum for c; = 0 and, as a function of by, is 
nonincreasing for —1 < by < 0 and nondecreasing for 0 < by < 1, so 
that it has a minimum for bu = 0. Hence, (7.12) is also true for this 
more general correlation structure. 

Siddk (1973) obtained an inequality using exchangeability when X is 
a central p-variate t random vector with the equicorrelation structure 
rij = p, i #39. He showed that 


Pr(b < Xı <a,...,b < Xp <a) 


> {Pr(b< Xi <a,...,b< X, < a)” 
> {Pr(b< Xi; <a)}¥ 


for all p >r > 2 and a > b. In an earlier paper, Tong (1970) obtained 
similar results for a much larger class of random vectors. 


7.5 Tong’s Probability Inequalities 


We noted in Chapter 1 that the multivariate ¢ density is Schur-concave 
in the particular case rj; = p, i # j. Tong (1982) used this property to 
derive certain probability inequalities. He showed that if f : RP — [0, 00) 
is Borel-measureable and Schur-concave, then, provided that the inte- 
gral exists, f A(x) f (y)dy is also a Schur-concave function of (21,...,p), 


7.5 Tong’s Probability Inequalities 
where A(x) denotes the rectangular set 
A(x) = {yly €R?,|yj| < zij =1,--.,p}. 
Taking f to be the pdf of the multivariate t, it follows that 
Pr(|Xj|<2j,j=1,...,p) < Pr(|Xj;|<@j=1,...,p), 


where = — (£1 +: + Zp). 


173 


8 


Percentage Points 


From the 1950s numerous authors have tried to compute the percentage 
points of multivariate t distributions. It is an indication of the interest 
in problems leading to applications of this “new” distribution. This re- 
search continued well into the 1990s and is still going strong. Although 
some of the results have by now lost their practical importance — in our 
opinion — it is essential for historical reasons to describe a majority of 
these contributions. This will certainly assist historians and experts in 
multivariate distributions to gain a better perspective of the develop- 
ments in the area. Moreover, many of the techniques involved in these 
calculations are ingenious and worthy of emulation and further investi- 
gation. 


8.1 Dunnett and Sobel’s Percentage Points 


Let (X1, X2) have the bivariate t distribution with joint pdf (6.1). For 
given v, p and probability level y, let d denote the equicoordinate per- 
centage point satisfying 


d d 
f | f (x1, £2; v, p)dzıdz2 = 7 
-00 J -œ 


The value of d can be determined for any v by trial and error using 
(6.3) and (6.4). However, this procedure becomes more involved as v 
increases. Dunnett and Sobel (1954) derived an asymptotic expansion 
in powers of 1/v that expresses d in terms of the corresponding quantity 
e for the bivariate normal distribution: e is defined by 


Í f f (z1, £2; 00, p) dzıdz2 =ý 
=o J—00 


174 


8.2 Krishnaiah and Armitage’s Percentage Points 175 


and can be obtained by interpolation in the classical tables of Pearson 
(1931), for example. Their expansion yields a good approximation of 
d even for moderately small values of v. The method of derivation is 
essentially the same as that used by Fisher (1941) in deriving an asymp- 
totic expansion for the percentage points of the univariate Student’s t 
distribution. Up to the terms of O(1/v), Dunnett and Sobel obtained 


e t'(t 
d = e~—<e?4+1- 
e 5 fe + FO } 
and by inverting 


e = arifesi- SO}. 


They also tabulated numerical values of the coefficients of the terms 1/v* 
for i = 0, 1,2,3,4. 


8.2 Krishnaiah and Armitage’s Percentage Points 


Consider X having a central p-variate t distribution with degrees of free- 
dom v and with the equicorrelation structure rj; = p, i # j. Krishnaiah 
and Armitage (1966) evaluated the multivariate percentage point d by 
solving the integral 


d d 
f -f f(z,- -, £p; V, p)dz1 - -d£p = Y (8.1) 
—o —-oo 


and produced extensive tables for all combinations of p = 1(1)10; v = 
5(1)35; p = 0.05(0.05)0.9 and y = 0.90, 0.95, 0.975, 0.99 (see also Ar- 
mitage and Krishnaiah, 1965). These computations use the approxima- 
tion that 


C pyv—-l 


A z” exp (2°) f° 
ra saf rem Le 


(fae) 


where ® is the cdf of the standard normal distribution and the upper 
limit c is chosen large enough to make the error of approximation as 
small as desired. Krishnaiah and Armitage (1966) took c = 10. 


176 Percentage Points 


8.3 Gupta et al.’s Percentage Points 
Gupta et al. (1985) solved equation (8.1) by setting 


|P(d)-y| < 107° 


with P(d) computed by the approximation (6.14). The numerical eval- 
uation of (6.14) involves the evaluation of the derivatives G®) (d) given 
by (6.15) for k = 0,1,...,8. But it is easily seen from (7.2) that 


ae a Vez +d 
(k) = —— p 
Gd) OR? fs ( ET ) Hy, (z)b(z)dz, (8.2) 
where H,(z) is the Hermite polynomial of degree k given by (6.11). By 


letting A = /p//1— p and B = 1//1— p and changing the variable 
by the transformation u = Az + Bd, (8.2) can be rewritten as 


amin = 3 (BY em (SE) 


Gupta et al. (1985) approximated this integral by 


9A+4B k E 7 
a = 3" (wom (Ra 


and the integration was carried out by Gauss’ method over intervals of 
length D = 0.5 starting from —9 until 9A + 4B was included. They 
provided tabulations of the percentage point d for all combinations of 


e p= 1(1)9(2)19; v = 15(1)20, 24, 30, 36, 48, 60, 120, 00; y = 0.75, 0.9, 
0.95, 0.99 and p = 0.1, 0.2(0.1)0.6; 

e p= 1(1)9(2)15; v = 15, 17, 20, 24, 36, 60, 120, œ; y = 0.9, 0.95 and 
p = 0.7(0.1)0.9. 


8.4 Rausch and Horn’s Percentage Points 


Rausch and Horn (1988) considered the particular case of (8.1) when 
the common p= 0. They used the approximation 


P(d) & wa? (=). 


The weights w; are calculated according to the formula 


_ T(n+ 8) l Mug 
= ranra mO 


8.5 Hahn and Hendrickson’s Percentage Points 177 
where 8 = v/2—-1, 


m I 
B zZ m + BY (=z) 
Ine) = Dy eee i! 
i=0 
are the Laguerre polynomials and 21,...,Zņ are the zeros of L£ (z). 


Rausch and Horn computed d for all combinations of 3 < p < 100; 
5< v < 120,v = œ; and 0.5 < y < 0.99. 


8.5 Hahn and Hendrickson’s Percentage Points 


Hahn and Hendrickson (1971) computed percentage points d by solving 
the equation 


d d 
P(d) = i f flen---s8p5mp)des de= (8.3) 


d 


for all combinations of p = 1(1)6, 8, 10, 12, 15, 20; v = 3(1)12, 15, 20, 
25, 30, 40, 60; p = 0, 0.2, 0.4, 0.5; and y = 0.90, 0.95, 0.99. As one 
would expect, these values are comparable to the positive square root of 
the values given by Krishnaiah and Armitage (1966, Section 8.2). Hahn 
and Hendrickson’s computations use the approximation that 


ra = CU 
-8 (Se) | sna] h(z)dz, 


where ¢ and © are, respectively, the pdf and the cdf of the standard 
normal distribution and A is the pdf of ./x2/v. 


8.6 Siotani’s Percentage Points 


Siotani (1964) suggested two interesting approximations for computing 
d in (8.3). The first approximation is the value dı satisfying 


pPr(X?>dj) = 1-7; 


this approximation had been suggested previously by Dunn (1958, 1961). 
By Bonferroni’s inequalities, one notes that 


lay ene) <1- Pd) Si 4; 


178 Percentage Points 


where 


ap = > Pr(X? > di, X? > d?). 


i<j 
Thus, if €1(7,p) is sufficiently small, then one can use dı as a good 
estimate. A modified second approximation is the value dz satisfying 


2pPr (X? >) = 1-y+ea1(7,p). 
This time, one notes that 


—€2 (y, p) < Y — P (d2) < 63 (7, p), 


where 
ely p) = J Pr(X? > di, X? > d) -a (7p) 
i<j 
and 
amh = >. Pr(X? > d3,X? > d3, x? > d3). 
i<j<k 


Since both €2(7,p) > 0 and €3(7, p) > 0, the absolute value of y — P(d2) 
may be expected to be sufficiently small for the tail of the p-variate t 
distribution to correspond to 1—7 for values of y > 0.95. For the partic- 
ular case p = 2, 4 = u2 = 0, and the equicorrelation structure fij = p, 
i # j, Siotani (1964) tabulated estimates of the probability in (8.3) for all 
combinations of d = 2.0(0.5)4.5; v = 10(2)50(5)90, 100, 120, 150, 200, co; 
and | p |= 0.0(0.1)0.9, 0.95. He also illustrated applications to interval 
estimation of the parameters in the model of a randomized block design 
and for coefficients in a normal regression equation. 


8.7 Graybill and Bowden’s Percentage Points 


Graybill and Bowden (1967) derived bounds for d satisfying (8.3) for the 
special case p = 2 and p = 0. In this special case (8.3) becomes 


Pr{Xi<@,X7<@} = 7 
or, equivalently, 
Pr {max (X?,X2) <@} = 7. 
But 
Pr {max (X?, X2) <P} = Pr{X? +X? < 2Fy2,-2} 


8.7 Graybill and Bowden’s Percentage Points 179 
and 
Pr {max (X?, X2) < Fy2v-2} < Pr{X? +X? < 2F,2»-2} 
< Pr{max (X?, X2) < 2F,2,-2}, 


where F, 2-2 is percentage point of the F distribution with degrees of 
freedom 2 and v — 2 corresponding to y. Hence, one obtains 


Fy 2,v-2 < a < QF 2-25 


the bounds given by Graybill and Bowden. In a related development, 
McCann and Edwards (1996) obtained the following lower bound for the 
left-hand side of (8.3) when the underlying correlation matrix R is of 
rank r 


P(d) > 1- in f tog PEEN 


-2 _ 
+Fe-1,1 {e+} ) r q(s)ds (8.4) 
r-l1 
with 
p-l 
A = eS arccos (T; 541) , (8.5) 
k=1 


where Fm,n is the cdf of an F distribution with degrees of freedom m 
and n, and q denotes the pdf of \/F,,,/r. This inequality requires only 
the evaluation of a one-dimensional integral and depends on R. through 
its rank r and also through the constant A. If one writes R = AAT for 
ap xr matrix A of rank r with rows aj, then it is interesting to note 
that the terms arccos(r;,;1) in (8.5) are the angles between consecu- 
tive a, vectors, which are points on an r-dimensional sphere. It is also 
straightforward to show that the d that sets the right-hand side of (8.4) 
equal to y is strictly increasing in A. This implies that, as A — oo, one 
has d > ,/rF\_y,r,, which is the percentage point given by Scheffé’s 
method. On the other hand, as A — 0, one has d > t(_¥)/2,v, the per- 
centage point of the univariate Student’s t distribution corresponding to 
(1 — y)/2. This is intuitively pleasing because A — 0 implies that cor- 
relations in R approach 1, in which case the p-dimensional distribution 
becomes one-dimensional for all practical purposes. 


180 Percentage Points 


8.8 Pillai and Ramachandran’s Percentage Points 


Pillai and Ramachandran (1954) tabulated solutions of (8.1) and (8.3) 
for 


e p = 1(1)8; v = 3(1)10, 12, 14(1)16(2)20, 24, 30, 40, 60, 120, oo; 
y = 0.05 and p = 0; 
e p= 1(1)8; v = 5(5)20, 24, 30, 40, 60, 120, œ; y = 0.05 and p = 0, 


respectively. For computing these percentage points, Pillai and Ra- 
machandran used the pdfs of 


Up = max(X},X2,...,Xp) 


and |U,|, which were derived as 


pwy = 
f(u) = Tea” s aE 


xT (== *) ap! 


and 
Ff (up) = D S p+2k-1 ye (pt2k-+v) /2 
j T(v/2)r”/2 = p re Duk a 
p+2k+v ae 
xT (=) opt 


respectively, where the a’s and b’s are the coefficients of the expansions 


2 k 
(vr Lo aè] 
= exp (FE) fag? tally + ay? T | 
and 


v ? à ky? f i 
(eCa = stow (ME) peaa], 
0 


respectively. Note that ak") = (1/2)* and pt) =1. 


8.9 Dunnett’s Percentage Points 


Dunnett (1955) tabulated solutions of (8.1) and (8.3) for p = 1(1)9; 
v = 5(1)20, 24, 30, 40, 60, 120, oo; y = 0.01, 0.05; and p = 0.5. For 


8.10 Gupta and Sobel’s Percentage Points 181 


solving (8.1), Dunnett evaluated the integral in (7.1) by using tables 
of the multivariate normal cdf computed by the National Bureau of 
Standards. For (8.3), Dunnett bounded P(d) by 


P(d) > [Pr(-d< X, <d,-d< X < dP” 


(Dunnett and Sobel, 1955) and evaluated the probability integral of the 
bivariate t distribution using expressions (6.3)-(6.4). In a latter paper, 
Dunnett (1964) obtained approximations for d in (8.3) for all p’s lying 
between 0 and 0.5. 


8.10 Gupta and Sobel’s Percentage Points 


Gupta and Sobel (1957) solved (8.1) for the special case p = 1/2. Note 
from (6.8) that 


P(d) = Pr(Z< v2d) 


and that Z = (Mp — Y)/S is asymptotically normal as both v and 
p tend to infinity. This allows for the use of a technique developed by 
Cornish and Fisher (1950) for computing the percentage points d directly 
without first computing a table of probability integral values. Applying 
their result, Gupta and Sobel arrived at 


d = yyt a3l, +aglg + asle + asl. + a34 Ieda + of, tees, 


where yy is the percentage point of the standard normal distribution 
corresponding to y, a4 is the standardized cumulant defined in (6.12), 
and Ie, Ig, I,2, ... are tabulated in Table I of Cornish and Fisher (1950) 
for the probability levels y = 0.75, 0.90, 0.95, 0.975, 0.99, 0.995, 0.9975, 
0.999 and 0.9995. Gupta and Sobel tabulated d for all combinations of 
p+1=2, 5, 10(1)16, 18, 20(5)40, 50; v = 15(1)20, 24, 30, 36, 40, 48, 
60, 80, 100, 120, 360, œo; and y = 0.75, 0.9, 0.95, 0.975, 0.99. 

Gupta and Sobel (1957) also obtained several bounds for the percent- 
age point d satisfying (8.3). An upper bound for d is obtained by setting 
(6.8) to be equal to (1+ y)/2 while a lower bound is obtained by setting 
(6.8) to be equal to y. These bounds are best for large y’s. For smaller 
values of y, Gupta and Sobel provided the following lower bound 


1/(2p 
NSTC uaa 
{P (1+ )} 
where Xoy is the percentage point of the chi-squared distribution (with 
p degrees of freedom) corresponding to y. 


182 Percentage Points 


8.11 Chen’s Percentage Points 


Chen (1979) provided an alternative formulation of percentage points d 
by solving the equation 


F(d,...,d;p,v) — F(—d,...,—d;p,v) Te. (8.6) 


for the special case p = 0, where 
x x 
F(a,...,2;p,v) = / ff f(a1,..-,2p3¥, p)dty--- day 
—0o -œ 


and f is the joint pdf of a central p-variate t distribution with degrees of 
freedom v and the equicorrelation structure rj; = p, 1 # j. Chen noted 
that (8.6) can be rewritten as 


oo 
f a- (ay) av) = 7 (8.7) 
where ©(-) is the cdf of the standard normal distribution and g denotes 


the pdf of \/x2; further, by a change of variable, z = /v/2y, (8.7) 
becomes 


= V2dz —V2dz \ | 2”— exp (—2?) z 
A a n 
Using the fact that the tail integral 
œ f e [ V2dz\ _ pe [ —v2dz\ | 2”~* exp (-2°) j 
a {* (5) i ( vo J} a 


[oe] gi exp (-z?) 
af Te A 


= 
<< 108 


for all v < 60, Chen approximated the left-hand side of (8.8) by 


10 v—] 2 
2 —v2d = 
Pld) = J ak Vidz i Vidz 2 m 
0 vv vv (v/2) 
and found the value dy such that P(do) = y. Tables of do were given for 


all combinations of y = 0.8, 0.9, 0.95, 0.99; p = 2(1)20; v = 2(1)30(5)60; 
and p = 0. 


8.12 Bowden and Graybill’s Percentage Points 183 


8.12 Bowden and Graybill’s Percentage Points 


Bowden and Graybill (1966) presented percentage points of the bivariate 
t distribution when the percentage points are not necessarily equal, that 
is, the case where 


d 
P(d,g) = ei f (z1, £2) drpdx, = y. 
-9 


Setting D = d and A = g/d, one can rewrite 


D AD 
Pda) = ff fem) deadss, 
-D J-AD 

which can be solved for D using the expressions (6.3) and (6.4). Bowden 
and Graybill computed D for all combinations of p — 2 = 4 (2) 16 (4) 24 
(6) 30 (10) 50; A = 0.5(0.1)1.5; | p |= 0.0(0.1)0.1(0.2)0.9; and y = 0.90, 
0.95. Trout and Chow (1972) extended this development for trivariate t 
distributions: Setting 


d g h 
f / f f (£1, £2, £3) dzzdz2dzr) = y 
—d J -g J -h 


and using the transformations D = d, A = g/d and B = h/d, the 
following expression is obtained 


D ,AD pBD 
f f f f (£1, £2, £3) drzdzgdx, = Ņ. 
-D J-AD J-BD 


Tables for D were given for all combinations of v = 5(1)9(2)29; A = 
0.5(0.1)1.5; B= 0.5(0.5)1.5; Tii = 0.1(0.4)0.9; M12 = 713 = T23 = 0; and 
y = 0.05. 


8.13 Dunnett and Tamhane’s Percentage Points 


More recently, Dunnett and Tamhane (1992) extended Bowden and 
Graybill’s (1966) calculations for multivariate t distributions of any di- 
mension with zero means and the equicorrelation structure rj; = p, 
i # j. Consider iid standard normal random variables Z;, 7 = 0,1,...,p 
and let S be a «/x2/v random variable independent of the Zj. Then 
the random variables defined by 

V1= pZ; — /pZo 

S 


XxX; 


j ; 1<j<p 


184 Percentage Points 


have the desired multivariate ¢ distribution. Thus 


P = Pr(X,<dh,...,Xp S$) 
[ 1. Pr(Zy < IE <p) b(2) h(s)dzeds, (8.9) 


where ej = (djs + ,/pz)/V/1— p, ¢ is the pdf of the standard normal 

distribution and h is the pdf of S. Dunnett and Tamhane (1992) ob- 

tained the following recursive formula for evaluating ®,..., = Pr(Zı < 
., Zp < €p) in the integrand of (8.9) 


®1...5 = Pr(Z, < €2, Z2 < €3,.. ., Zp-1 < ep) Ẹ (e1) 
+Pr(Z, < e1, Z2 < €3,..., Zp—1 < €p) 
x {® (e2) — & (e1)} 


+Pr(Zı < €1, Z2 < 2,.. ., Zp-1 < €n-1) 
x {@ (ep) — È (ep-1)}, (8.10) 


where ®(-) denotes the cdf of the standard normal distribution. They 
also suggested the following algorithm for computing ®)...p 


Step 1: Calculate ®; = G(e;), for j = 1,...,p, a total of p terms. 
Step 2: Calculate jk = 6; 6, + &;(, — ©), forl<j<k<p, 
a total of (2) terms. 

Step 3: Calculate ® 5x1 = kdj + Palk - &;) + © 54. (D — ,), 
forl<i<j<k<p,a total of ($) terms. 


Step p: Calculate ®12...p = ®2...p®1 + O13...p(@2 — 1) +-+ 
51...p-1(®p — Bp-1). 


The computational details of this algorithm can be found in Dunnett 
and Tamhane (1990). Kwong and Liu (2000) — using Kwong’s (2001b) 
lemma — proposed the following modification of (8.10) 


m 


m (5) Cna Jn (810 


where m > 2, J; = (e1), and Jo = 1 (see also Kwong, 2001a). 

There are three commonly known approaches for determining the per- 
centage points dı, .. ., dp in (8.9) (after setting P = y): the step-up pro- 
cedure, the step-down procedure, and the simulation approach (Dunnett 


8.13 Dunnett and Tamhane’s Percentage Points 185 


and Tamhane, 1995). We shall describe them below by means of recur- 
sive algorithms. Throughout we shall let Xj) < -++ < X(p) denote the 
order statistics of X,,..., Xp and let c),...,cp denote the corresponding 
ordering of di,...,dp. 


e Step-up procedure 
(i) Take cı to be the 1007 percentage point of the univariate Stu- 
dent’s ¢ distribution with degrees of freedom v. 
(ii) Solve the equation 
Pr (Xo) < c1, X(2) < c2) 
Pr(X, < c1, Xo < C2) +Pr(c, <Xı< C2, X2 < c) 


a 
for c> by evaluating the two bivariate probabilities using (8.9) 
and the value for cı defined in (i). 
(iii) Solve the equation 
Pr (Xa) < £1, X{2) < co, X(3) < c3) 
Pr (X1 < c1, X2 < Co, X3 < c3) 
+Pr(Xı < ¢c1,c2 < X2 < c3, X3 < c2) 
+Pr(c < X; < co, X2 < c3, X3 < c3) 
+Pr(cı < Xi < c2,¢1 < Xo < c3, X3 < c1) 
( 
( 


+Pr(eg < Xı < c3, X2 <4, X3 < c2) 
+Pr(c2 < Xi < ¢3,¢1 < X2 < c2, X3 < c1) 
= y 
for c3 by evaluating the six trivariate probabilities using (8.9) 
and the values for cı and cz defined in (i) and (ii), respectively. 
(iv) In general, the recursive formula below defines how the region 
over which the probability must be evaluated can be subdivided 
to obtain probability expressions 
[Xo) [L Cl; ,Xíp) < cp] 
= {X < C1, [Xio < C2,. -X (p) < cp) } 
+ {ce < Xi <e, [Xo < c1, X(3) <3,---,X(p) < cp] } 


+ {ep-1 < X1 < cp, [Xo < c1,--- Xip) < cpa], 
(8.12) 


186 Percentage Points 


where X(2) < -++ < Xip) denote the order statistics of Xo, ..., 
Xp with X, separated out. Formula (8.12) is applied recur- 
sively to the terms enclosed within the square brackets. This 
leads to a division of the region into p! subregions that have 
rectangular boundaries, making it possible to evaluate the in- 
dividual probabilities (using (8.9)). 


e Step-down procedure: See Dunnett and Tamhane (1991) for a lucid 
description. 


e Simulation approach: This approach is feasible provided that p is not 
so large that sampling errors in the values of c3,...,¢p-1 accumulate 
and render the estimated value of cp too uncertain to be of practical 
use. The procedure for estimating Cm given the values of ¢,...,Cm-1 
(Edwards and Berry, 1987) is as follows. 


(i) Let Nr denote the total number of simulations to be performed 
and choose Nr so that No = (1 — y)(1 + Nr) is an integer. 


(ii) Initialize a counter, Ne = No. 


(iii) For each simulation, draw m standard normal deviates Z1, ..., 
Zm having the desired correlation structure and, if v is finite, 
a random x2/v variate S?. 


(iv) Set X; = Z;/S if v is finite or X; = Z; if v = œ and order the 
X values to obtain the order statistics Xa) < `+- < Xim). 


(v) Check whether Xq) < ¢1,-.-,X(m—1) < Cm-1- If this is the 
case, store the value of X(m) and return to step (iii). Otherwise, 
decrease Ne by 1 and return to step (iii). 


(vi 


~~ 


After completing the Nr simulations, find the estimate of cm 
by counting down Ne from the top of the ordered values of the 
stored X(m). Note that this approach is general and does not 
impose restrictions on the correlation structure rj;. 


Dunnett and Tamhane (1992) computed values of di,...,dp) using the 
step-up procedure for all combinations of p = 2(1)8; v = 10, 20,30, 00; 
p = 0,0.1(0.2)0.5; and y = 0.95. Kwong and Liu (2000), using the 
same procedure but with the modification (8.11), computed values of 
d,,...,dp for all combinations of p = 9(1)20; v = 10, 20, 30, 00; p = 0.1, 
0.3, 0.5; and y = 0.95. 


8.14 Kwong and Liu’s Percentage Points 187 


8.14 Kwong and Liu’s Percentage Points 


In the case rj; = b:b;, Kwong and Liu (2000) pointed out that (8.9) can 
be generalized to 


P= 3 D Hy...p$ (z) h(s)dzds, 


where Hj...) = Pr( Xi < €1,..., Xp < €p), ej = djx and X; are iid nor- 
mal random variables with means b;z and variances 1 — bi. The recursive 
formula, (8.10) and the algorithm for computing it can be generalized in a 
natural manner. Hence the step-up, step-down, or the simulation-based 
procedure can be used to compute the percentage points d,,...,dp. A 
fourth procedure not discussed above is one based on approximation. 


(i) c and c2 can be determined as in the step-up procedure. 

(ii) To determine c3, replace r12, 713, and r23 by p3 = (ri2 +1713 + 
r23)/3. Taking this as the common p and using the previous 
values c1, C2, apply the step-up procedure to estimate c3. 

(iii) To determine c4, replace r12, 713, T14, T23, T24, and r34 by p4 = 
(rig + T13 + T14 + T23 +724 +134)/6. Taking this as the common 
p and using the previous values c1, c2, ¢3, apply the step-up 
procedure to estimate cq. 

(iv) In general, replace the (7) correlation coefficients by their arith- 
metic average pm and use the previously calculated values of 
C1,---;Cm—1 to obtain an estimate for €m using the step-up pro- 
cedure. 


This procedure is similar to the ones presented in Dunnett (1985), Hochberg 
and Tamhane (1987, page 146), and Iyengar (1988). 


8.15 Other Results 


Some other tabulations of percentage points of the multivariate ¢ distri- 
butions with the equicorrelation structure are contained in the following 
references. 


e Paulson (1952) for p = 3,6 and p = 0. 

e Dunnett and Sobel (1955) used the lower bounds (7.4), (7.5), (7.6), 
and (7.10) for p = 3, 9; v = 5,00; p = 1/2; and y = 0.50, 0.75, 0.95, 
0.99. 

e Halperin et al. (1955) for p = 3(1)10, 15, 20, 30, 40, 60; v = 3(1)10, 
15, 20, 30, 40, 60, 120, 00; p = 1 — 1/p; and y = 0.95, 0.99. 


188 Percentage Points 


Gupta (1963) for p = 1(1)50; v = œ; p = 1/2; and y = 0.75, 0.9, 0.95, 
0.975. 

Milton (1963) for extensions of Gupta’s (1963) tables for y ranging 
from 0.5 to 0.9999. 

Steffens (1969b) for p = 2. 

Dunn and Massey (1965) for p = 2, 6, 10, 20; v = 4, 10, 30, co; 
p = 0.0(0.1)1.0; and y = 0.5(0.1)0.9, 0.95, 0.975, 0.99. 

Tong (1970) for a procedure to calculate conservative estimates of the 
percentage points for p > 20 using tabulated values for p = 20. 
Freeman and Kuzmack (1972) for p = 6, 8, 10(5)30; No = 10, 20, 
mean (40-70), 50, 100, mean (90-500), 500; p = 0; and y = 0.90, 
0.95, 0.99, where v = p(No — 1). 

Gupta et al. (1973) for p = 1(1)10(2)50; v = œœ; p = 0.1, 0.125, 0.2, 
1/3, 0.375, 0.4, 1/2, 0.64, 0.625, 2/3, 0.7, 0.75, 0.8, 0.875, 0.9; and 
y = 0.75, 0.9, 0.95, 0.975, 0.99. 

Amos (1978) for p = 100; v = 100; p = 0.01; and y = 0.05, 0.1, ..., 
0.95. 

Ahner and Passing (1983) for 1 < p < 20; 2 < v < 120, v = œ; p=0; 
and y = 0.95, 0.99. 

Bechhofer and Dunnett (1988) for the most comprehensive table to 
date for p = 2(1)16, 18, 20; v = 2(1)30(5)50, 60(20)120, 200, o0; 
p = 0.0(0.1)0.9, 1/(1 + vp); and y = 0.80, 0.90, 0.95, 0.99. 

Kwong and Iglewicz (1996) for p = 4, 5; v = p(1)20, 24, 30(10)100, 
120, œ; p = —1/(p — 1); and y = 0.90, 0.95, 0.99 (note that the 
correlation matrix is singular). 


There has been relatively little work concerned with the percentage 
points of multivariate ¢ distributions when the correlations are not equicor- 
related. Apart from those mentioned above, four results known to us 
are 


e In the case that the (i,7)th element of the inverse of R is 


-2/(p+1), ift #3, 
Goldberg and Levine (1946) computed the percentage point d satis- 
fying 


yi = eee ift = J, 


d d ` 
T f f (21,..- 2p; v)dzı -drp = Y (8.13) 
=% =% 


for combinations of p = 3; v = 1(1)30(3)60(15)120, 150, 300, 600, co; 


8.15 Other Results 189 


and y = 0.50, 0.75, 0.90, 0.95. This seems to be the earliest paper on 
this topic. 
In the case 


e ar 2min(i,j) f1- PZG), 


Freeman et al. (1967) computed the percentage point d satisfying 
(8.13) for combinations of p = 3(1)5; (v/p) + 1 = 10(10)100, 200, 
500; and y = 0.95. See also Bechhofer et al. (1954) and Table 4 in 
Dunnett and Sobel (1954). 
e In the case that the (7,7)th element of R is 
NN; 
ij (no + ni) (no + n4)’ 
0 i 0 j 
where nz denotes some treatment sample size, Dutt et al. (1976) 
computed d in (8.13) for all combinations of y = 0.95, 0.99 and 
3 < ni < 12, i = 0, 1, 2, 3 with p = 3 and the degrees of freedom 
v = } (n; — 1). See also Dutt et al. (1975). 
In the case ri; = 0 for all 1 A j except that 


Tiitl = oS a 

$ VG + Niyi yiti + Nite 
for some treatment sample sizes ng, Lee and Spurrier (1995) provided 
tables of one-sided and two-sided percentage points — of the form 
(8.13) - for 3 < p < 6 and ng = n (the balanced case). Liu et 
al. (2000) extended these tables for 3 < p < 10; v = 5(1)8(2)20, 25, 
30, 40, 60, 120, 00; and y = 0.90, 0.95, 0.99. See also Somerville et 
al. (2001). 


Calculations of percentage points for singular correlation structures 
of the form rj; = —b;b; are discussed in Spurrier and Isham (1985) and 
Kwong and Iglewicz (1996). The former provided tabulations of the 
percentage points for p = 3, 3 < nı < n2 < ng, 10 < N < 29, and 
~ = 0.90,0.95,0.99 when by = /ng/(N — npk) with N = ni +n + ng 
(where ną denotes some treatment sample size). 

Calculations of percentage points for correlation structures more gen- 
eral than the decomposable structure rj; = b;bj are quite difficult and 
challenging. Even the generalization to quasi-decomposable structures 
of the form rj; = bib; + Yi; (Yang and Zhang, 1997), where the 7;;’s are 
nonzero deviations for some i and J, is a rather restrictive assumption. 
A solution is to find the “closest” R for a given matrix R, which still 


190 Percentage Points 


possesses the decomposable correlation structure (Hsu, 1992; Hsu and 
Nelson, 1998). One also has the choice of adopting the simulation ap- 
proach or one of the general approaches due to Somerville (1997, 1998b) 
and Genz and Bretz (1999) — described in Section 6.10. 


9 
Sampling Distributions 


Here, we shall consider sampling distributions of certain statistics asso- 
ciated with multivariate ¢ distributions. 


9.1 Wishart Matrix 


Suppose X,,...,X»n is a random sample from a p-variate t distribution 
with the common pdf 
T ((v + p)/2 
fou) = —TE+p) 


(rv)P/?T (v/2) [RI 
1 T -(v+p)/2 
x JL + > i-u) Ro (x; ~ p) 


The joint pdf of the n independent observations is given by 


F (xi,---,Xn) = f (xi)---f (xn). (9.1) 


However, it is more instructive to consider dependent but uncorrelated 
t distributions. Joarder and Ahmed (1996) suggested the model 


T ((v + p)/2) 
(nv)?”? T (v/2) RI"? 

ic —(vt+np) /2 
x JL += 90 Gi- u)" RO (xi — y) ; 


i=l 


f(xi,---)Xn) = 


(9.2) 


which they referred to as the multivariate t model. Joarder and Ali 
(1997) remarked that this model can also be written as a scale mixture 


191 


192 Sampling Distributions 


of multivariate normal distributions given by 


f (X1,---,Xn) 
© ([-2R —n/2 n = 
= f or exp f- $ (xi = u)” (PR) (xi - w} h (7) dr, 


i=] 
where 7 has the inverted gamma distribution with the pdf 


h (7) 


a es) (9.3) 


OPT (v]2) P A” 272 


for r > 0. Equivalently, X; | 7 has the multivariate normal distribution 
N,(u,7?R). Among others, Zellner (1976) and Sutradhar and Ali (1986) 
considered (9.2) in the context of stock market problems. By successive 
integration, one can show that the marginal distribution of X; in the 
multivariate ¢ model (9.2) is p-variate t. It also follows from (9.2) that 
E(X; — w)(Xi — p) = 0 for j #1. Thus, in (9.2), although X),..., Xp 
are pairwise uncorrelated, they are not necessarily independent. More 
specifically, X1,...,X,» in (9.2) are not independent if v < oo, since 
independence would imply that X,,...,X, are normally distributed. 
The case of independent normally distributed random vectors can be 
included in (9.2) by letting v — oo. In the case v = 1, (9.2) is the 
multivariate Cauchy distribution for which neither the mean nor the 
variance exists. Kelejian and Prucha (1985) proved that (9.2) is better 
able to capture heavy-tailed behavior than an independent ¢ model given 
by (9.1). 

The sampling quantities of interest are the mean vector and the sum 
of product matrix (Wishart matrix) given by 


and 


A = 5» (Xi -X) (xX; — xX)’, (9.4) 


i=1 


respectively. Sutradhar and Ali (1989) derived the corresponding pdfs, 
which are 
vv/?T ((v + p)/2) -1/2 
x —_.—-——_ |R 
A waro R/nl 


x E + (x - u)? R (ž- n)| -(v+p)/2 


9.1 Wishart Matriz 193 


and 
T ((v + p)/2) —(n-1)/2) 4 |-(n—p—2)/2 
A) = —?—_ R A 
x [v + tR 1A] CPO? | (9.5) 


respectively, where A > 0, n > 1+p and F, (z) is the generalized gamma 
function defined by 


»o-)/4 f] r [2i] 
D(z) = r Jr (===) . (9.6) 
i=1 

The distribution of the Wishart matrix, (9.5), has its applications in fac- 
tor analysis. More specifically, in practice, one may be confronted with 
the situation where the observed data have a symmetrical distribution 
with tails that are fatter than that predicted by the normal distribution. 
In such cases, one could explicitly account for the observed “fat tails” by 
using the multivariate t model (9.2). Consequently, in factor analysis, 
analogous to the Wishart distribution, one may use the distribution of 
the sum of products matrix under (9.2). 


It is easily checked that, as v — oo, the pdf (9.5) converges to 
RI Y/2 | qj P-2)/2 
rara BE 

2p I, ((n — 1)/2) 


which is the pdf of the usual Wishart distribution W,(R,n). The pdf 
(9.5) can also be written as the mixture of distributions 


exp |- sR al , 


f (A) B A [eR ev? 
= h PD, 0-2 


x exp -5 ( (R) A)| f(r)dr, 


JAJ eae 


where 7 has the inverted gamma pdf (9.3). This is equivalent to saying 
that A | 7 = 7?W has the usual Wishart distribution W,(r?R, n). 
Joarder and Ali (1992) and Joarder (1998) derived various expectations 

of the Wishart matrix A. Specifically, one has the following expressions 
n—-1 

1—2/ pe 


E(A) = 


na _ n- 1)?R? + (n — 1) {RtrR + R? } 
EARS a O 


v >k, 


194 Sampling Distributions 


k\ _ U(v/2—kp)T, ((n —1)/2+ k) ; 
B(\al) = v-*P(v/2) T(n- 1)/2) IRI", 
y>k, 
((n —1)/2+k)E (v/2- kp — 1) 
E (ialf A) v0 (v/2) 


Tp ((n— 1)/2 +k) 
Tp ((n — 1)/2) 
n+2k>1, v>2(kp+1), 


IR|‘R, 


20 (v/2 —kp+1)T, ((n — 1)/2 + k) 


a (al A”) = vi-kp(n + 2k — p)T (v/2)T, ((n — 1)/2) IRR, 
n+2k>p+2, v>2(kp—1), 
E [ra)?] CEEE Ç — 1) (trR)? + 2tr (R?)] 
vy>4 
and 
E [tr (A2)] TESTO IED [ntr (R?) + (trR)?], 
v> 4, 


where k is any real number and v > 0. These expectations are impor- 
tant tools in developing estimation theories for the correlation matrix, 
inverted correlation matrix, trace of the correlation matrix, and other 
characteristics of the correlation matrix, of the multivariate £ model un- 
der quadratic loss functions. Extensions of these expectations to the 
class of scale mixtures of normal distributions — which may be useful in 
inferential works having a ¢ distribution or the scale mixture of normal 
distributions as the parent population — are discussed below. 

Sutradhar and Ali (1989) derived an elementwise expression for the 
variance-covariance matrix of A. Letting m,; denote the (7, 7)th element. 
of R!/2, they showed that 


v-2 P 
Cov (Aij, An) = 70-1) YO MiuMjuMkuMu 
u=1 
p 
+(n — 1)(n — 2) 5. MiuMjuMku Mu 
u=1 


+(n-— 1) > MiuMjuMevMiy 


uu 


9.1 Wishart Matriz 195 


2 
+(n- 1) >S MivMjyMkuMiu 
uu 


+(n = 1) X MiuMju (MkuMiv + MkuMiu) 
u<u 


+(n-1) ` MivMju (MkuMiy + masma) 


u<u 


p 
(n— 1)? > mam) 2 men) 
u=l 
fori #k,j Æl, i,j k,l =1,...,p, and 
yY — 
Var (A;) = = A n-1) (x: mame) + 2(n —1) >> mim), 


+(n- 1) bD (MiuMju + ma) 


u<u 


2 
-(n- 1) HS mum) 


for i,j =1,...,p 

Let A = TTT and A = SMS? be the triangular and spectral de- 
compositions of A. Let W = UUT be the triangular decomposition of 
W ~VW,(R,n). Let ,...,l) and m,...,mp, denote the latent roots 
of W and A, respectively. Also define 


1 
n+p+1—2i 


we 
and 
v-2 1 

v n+p+1-2i 
with D = diag(d),...,d,) and D* = diag(dj,...,d5). Then, some fur- 
ther expectation identities involving A useful in the estimation of R 
are 


Eflog(|A[)] = flog (|W])] + 2p (log?) , 


E [iog ((R>A|)] = $E [log (xi41-i)] + 2pE (logr), 


t=1 


196 Sampling Distributions 
and 
E [tr (R'TATT)] = —3E [tr (RUAU?)] 


(Joarder and Ali, 1997), where A is a positive definite diagonal matrix 
and 7 has the inverted gamma distribution given by (9.3). 

Joarder and Ahmed (1998) considered a generalization of the multi- 
variate t model in (9.2) when the random sample X),..., Xn is assumed 
to come from a p-variate elliptical distribution with the joint pdf 


co eR 2 
f (xi) Ei f (27)?/2 
where h(-) is the pdf of a nondiscrete random variable 7. Many multi- 
variate distributions having a constant pdf on the hyperellipse (x — ys)? 
R: (x — u) = c? may be generated by varying h(-). In this general 
case, the model corresponding to (9.2) has the joint pdf 


exp {5 (ti = 1)" (rR) (xi = w)} Wr, 


f (x1,---)%n) 
= [eR lg Teganya 
= L —= _ ;— h(r)dr. 
[Gaya PEL OH Hw" (PR) (= u) Mere 
(9.7) 
The observations X,,...,X, are independent only if 7 is degenerate 


at the point unity, in which case the joint pdf (9.7) denotes the pdf of 
the product of n independent p-variate normal distributions each being 
N,(u,R). Furthermore, if v/T? has the chi-squared distribution with 
degrees of freedom v, then (9.7) reduces to (9.2). The pdf of the Wishart 
matrix A under the generalized model (9.7) takes the form 


—n/2 (n—p-1)/2 
|R|” A] 1 ere 
T, (n/2) exp -5tr ((r R) A) h(r)dr, 


where A > 0, n > 1+>p, andTI,(-) is as defined in (9.6). Some expecta- 
tions of A useful for estimating R are 


T (n/2+r) 


f(A) = 


EAD = 2 gy P Te 
Tp (n/2+k) oi 
k = kp P TEA 
E |A| Al = (n+ 2k) IRI! Ri 


and 


E [tra)?] = ny [n (trR)? + 2tr (R?)| ; 


9.1 Wishart Matrix 197 


where y E€ R, k € R, n+ 2k > 0, Yopr = ET?) > 0, Y2kpp2 = 
E(r7*P+2) > 0, and y4 = E(r*) (all assumed to exist). 

The pdfs of A in the real and complex cases — under the indepen- 
dence model (9.1) — were originally studied by Cornish (1955) and Gupta 
(1964), respectively. Nagarsenker (1975) provided a very detailed study 
of the distribution of A and its quadratic forms. He investigated both 
the noncentral real and the noncentral complex cases. 

Let Y be a p x n matrix of iid normal random variables with means 
E(Yij;) = pij and covariance matrix ¢?7R. Assume that S is an indepen- 


dent random variable having the ,/o!*x3,,/(2v) distribution. Then the 


noncentral version of A is defined by 
Š 7 
52 72 (Y iz — Yi) (Yin — ¥5) ; 
where 
1 n 
=> > Yi. 
j=l 


In the real case, S?A has the noncentral Wishart distribution. In the 
complex case, it should be interpreted as a Hermitian positive defi- 
nite matrix having the noncentral complex Wishart distribution (James, 
1964). In the noncentral real case, Nagarsenker (1975) established that 
the pdf of A is given by the complicated expression involving zonal 
polynomials 


f(A) 
(orn) |R |-(n-/2} A |(n—p)/2 exp {-trR- ppt / (20?)} 
aP=1) (2v)P™—DPL, (n — 1)/2)F(v) 
: ` v (o') T (v +k + p(n — 1)/2) Cr (Ruu TRA) 
ko k k(n — 1)/2)x (4v0*)* {1+ (o”trR-!A) /2v0?} 


—1 
POD 4 6, (9.8) 
where K = {ky,..., km}, ki > ka > e > km > 0, ki +khot---+km = k, 
_ Pm(2,4) 
(z) = Tala)” 


T, (2,4) = TOME (e+ h)T (z+) T (24h - 5S), 


198 Sampling Distributions 


and C’,(T) are symmetric homogeneous polynomials of degree k in the 
latent roots of T. In the particular case 0 = o’ and u = 0, (9.8) 
reduces to the expression given in Cornish (1955). If B is an (n — 1) x 
(n— 1) symmetric positive definite matrix of full rank, then Nagarsenker 
(1975) established further that the quadratic form Q = ABAT has the 
formidable pdf 


f(Q) 
(on PD | R ae | Q |(n—P)/2 
a1) 2P- DAT, (n ~1)/2) Fw) | BP? 


gy ee ee) ( Jp (PPD ev). 


tao kaloy" Cy (In-1) 
(9.9) 


In the particular case ø = o' and B = I,,-1, (9.9) reduces to equation 
(14) in Cornish (1955). Nagarsenker also provided the joint cdf and the 
moment generating function of Q as well as the corresponding expres- 
sions for the noncentral complex case (which generalizes those given in 
Gupta, 1964). 


9.2 Multivariate t Statistic 


A random variable X with iid copies X1, X2,... is said to be in the 
domain of attraction of the normal law if there exists an —> oo such that 


as n — oo. It is well known that, for X in the domain of attraction of 
the normal law, the ¢ statistic defined by 


T, = Re (He) > N(0,1) (9.10) 
ei (Xi = X) 


as n — oo, where, as usual, X = (1/n) ©; X 

Sepanski (1994, 1996) provided two multivariate analogs f (9.10). Let 
X be a p-variate random vector with mean vector yz and covariance 
matrix X. Also let X,,X»o,... be iid copies of X. Then X is said to 
be in the domain of attraction of a p-variate normal law if there exists 


9.3 Hotelling’s T? Statistic 199 


Gy — œ such that 


1 n 
— > (Ki-vw) > N(0,C) (9.11) 
an i=l 
for some nonsingular matrix C. Sepanski (1994) defined the multivariate 


t statistic by 


T, = DGPS (X%i-p), (9.12) 


i=1 


where, for some sequence bn > 0, Dy = Cy, + bnl and 


Note that Cn is symmetric nonnegative definite while D, is symmetric 
positive definite. Under the assumption that X satisfies (9.11), Sepa- 
nski (1994) showed that Tn — N(0,I) as n — oo. Sepanski (1996) 
established the same limiting result under weaker conditions by taking 


Ta = CP Y (Xi- u) (9.13) 
i=1 


and considering its behavior when X is in the generalized domain of 
attraction of a normal law, which means that there exist matrices An 
and vectors y2,, such that 


An > Xi- > N(0,1) (9.14) 
t=1 
as n — oo. See Hahn and Klass (1980a, 1980b) for several examples 


of random vectors satisfying this condition and for an algorithm for 
constructing the normalizing matrices Ay. 


9.3 Hotelling’s T? Statistic 
A customary approach to the estimation/testing problem is based on 
the so-called Hotelling’s T? statistic. It is defined by 
H2 = n?(K-p)’ C7) (K-n). 


Under the normality, it is well known that (n — p)H2/(p(n — 1)) is 
distributed as an F distribution with degrees of freedom p and n—p (see, 
for example, Anderson, 1984). The distribution of H? has been studied 


200 Sampling Distributions 


under a mixture of two normal distributions by Srivastava and Awan 
(1982) and Kabe and Gupta (1990). Iwashita (1997) investigated the 
asymptotics of H? under an elliptical distribution. Unfortunately, there 
are no direct results for the specific case of multivariate ¢ distributions. 
For completeness, however, we shall survey the results when X has an 
elliptical distribution. In this case, the characteristic function of X can 
be written as 


b(t) = exp {it?m} Y (m7Q™'m) (9.15) 
for some nonnegative function ¥ (Kelker, 1970) and the parameter 
YO 
{Wop ~ 


which controls the kurtosis of the distribution. Iwashita (1997) provided 
an asymptotic distribution of H2 under the null hypothesis that m = p 
and a local alternative of it. Up to the order of 1/n, the asymptotic null 
pdf of H2? is given by 


j=0 
where 

1 

co = -7P tp + K(p + 2)}, 
1 

a = ~sp{1~x(p+2)}, 
1 

o = JPO+I(L-»), 


and gą(-) denotes the pdf of a chi-squared distribution with degrees of 
freedom k. Iwashita (1997) also derived the percentiles and approximate 
powers of the H2 statistic. An asymptotic expansion of the cdf of H2 
under the two assumptions 


(i) E(|| Y |f) < œ, where Y = ©71/?(K — u) and X is a px 1 
random vector with mean vector 4 and covariance matrix ©; 

(ii) the distribution of Y = (¥j,..., Y,) has an absolutely continuous 
component with a positive density on some nonempty open set 
U such that 1,41,-.-,%p,¥7,¥1¥2,---, yp are linearly independent 
on U 


9.3 Hotelling’s T? Statistic 201 
is given by Fujikoshi (1997). It takes the form 


3 
1 1 
Pr(H Sa) = Gye) + D AGr) +0 (=) ew 
uniformly for all positive real numbers x, where G,(-) denotes the cdf 
of a chi-squared distribution with degrees of freedom k. The coefficients 
B;’s are given by 


= ghey 7 ant 10 
Bo = 4? +z (x ) 44 3 
— hee DO ob a 
A = -zp- 5 («$ ) t3“ , 
1 1 1 
fo = 5p(p+2)—5 (ns?) - Gat, 
and 
wo ON Dey" 
Bs = x (st?) +5 (a0?) | 
where 


a 
wom 
fay 
w 
Il 
— 
a 
~ 
2 
5 
= 
<~ 
— 


KP = 5y klibik) klikk) 5 
i,j,k 


1 bogus 
Kí BS SO RUD, 
i,j 


and «>--+4) are the jth cumulants of Y. If X has the elliptical distri- 
bution given by (9.15), then 83 vanishes to zero and x reduces to cp for 
k = 0,1,2. Kano (1995) obtained the same asymptotic expansion as in 
(9.16), using a different method. 

It is well known that, for large samples, H2 has a limiting chi-squared 
distribution with degrees of freedom p. The usual underlying assumption 
for this result is simply that E || X ||< oo. More general limiting 
behavior of H? has been studied by Eaton and Efron (1970), Sepanski 
(1994), and Fujikoshi (1997). Sepanski (1994) showed that, under the 


202 Sampling Distributions 


assumption of the generalized domain of attraction (defined in (9.14)), 
the modified Hotelling’s T? statistic 


H? = n®(K—p)"D;" (R-n) 


still has an asymptotic chi-squared distribution with degrees of freedom 
p. Eaton and Efron (1970) studied the distribution H? when X has 
orthant symmetry, that is, X has the same distribution as DX for any 
choice of the diagonal matrix D with diagonal elements equal to 1. 
We shall now consider the Hotelling’s T? statistic in the context of 
testing equality of means. Suppose X; = (Xi1,---,Xin,)" and X; = 


(X21,--.,X2n,)? are two samples of size nı and nz, respectively. In 
analogy with (9.2), assume that X; and X2 have the joint pdf given by 
f (X1,X2) 


A -(v+np)/2 
x |R”? v-2+9 Y (xij -= pj)" Ro (xij — pi) , 


i=1 j=l 
where n = nı + nz. It is immediate that Xj; is p-variate t with mean 
vector #2;, correlation matrix R, and degrees of freedom v. Also, the 
elements of the combined sample of size n = nı + nz are pairwise un- 
correlated. The Hotelling’s T? for testing equality of means takes the 
form 


2 nin2 = T -1 <7 
T ni +My (X1 - X2) Spooled (Xi - X2), 
where 
2 ni 
1 Ns Bn = \T 
X; — X;) (X; — X; 
Spooled im bmg 23 De j )( j ) 


Sutradhar (1990) derived the nonnull distribution of the T? statistic, 
given by the pdf 


P(E) = Žale- (0415-2) 


-p-1 
x By (« +5, mime) | (9.17) 


where 
T (k) (m)z"-! 


Pm) = tma 


9.3 Hotelling’s T? Statistic 203 


and 
Ny N92 


Tp-l 
z = R — m). 
SE (Hı — Ha) (H — He) 


Note that, under Ho : p; = Ho, where 6 = 0, the pdf of T? in (9.17) 


reduces to 

se = g (pm), 
which implies that, under Ho, T? (nı + n2 — p — 1)/p has the usual F 
distribution with degrees of freedom p and nı +n —p— 1. Thus the null 
distribution remains the same as in the normal case. Furthermore, the 
power of the Hotelling’s T? test can be computed by using the nonnull 
pdf in (9.17). 

Kozumi (1994) considered testing equality of means when the two 
samples X; and X% have mutually independent t distributions with equal 
correlation matrices and equal degrees of freedom. When the sample 
sizes are equal (say, nı = nz = n) the T? statistic is given by 


T? = ny" S7'y, 


where y and Są are, respectively, the sample mean and the sample co- 
variance matrix of the differences y; = 21; — £2j. For unequal sample 
sizes, assuming without loss of generality that nı < nz, the T? statistic 
is given by 


where 


ni n2 
Ny + 1 1 
Zj = X15 — 4/ —X2j —— >} x2- — > X2 
j j j 8 8 
ne ynn = ne P 


and Z and S, are the sample mean and the sample covariance matrix 
of the z;’s. It should be noted that T? reduces to T? in the case nı = 
no =n. Under the Ho : p) = ps, (nı — p)T2/(p(nı — 1)) has the usual 
F distribution with degrees of freedom p and nı — p. The nonnull pdf 
of T? is given by the infinite sum involving Student’s t pdfs 


n o mm = (n26)* Tlk +v) 
fE) = (nı — 1)T? C/A 2+ HB O/2 FF, (nı — p)/2) 


2 p/2+k— al: #2 =(nı /2+k) 
x s 1 3 
(— = 7 ( E Nı — z) 


xJ (z;n1, no, k,ô, v), 


204 Sampling Distributions 
where ô = (H) — ftp)? R! (p; — Ho) and the integral 


1 
J (a371,72,k,6,v) = J {z1 - z) t? {rade — 2) 
0 


—(k+v) 
w(r-B) art dz. 
ny ny 


Kozumi (1994) also provided an expression for the cdf of T? and calcu- 
lated the powers of T? corresponding to the sizes a = 0.01,0.05, p = 5 
and for various values of nı, n2, ô, and v. 


9.4 Entropy and Kullback-Leibler Number 


The forms of entropy and Kullback-Leibler number for the multivariate 
t distribution were discussed earlier in Chapter 1 (see equations (1.27), 
(1.29), and (1.31)). Here, we shall discuss the corresponding sampling 
properties. 

The entropy for the central p-variate t involves the correlation matrix 
R, and it is known that the maximum likelihood estimator of R for a 
sample of n observations is based on the Wishart matrix A in (9.4). 
Hence it is of interest. to consider the sampling properties of the differ- 
ence ô = H(X; A) — H(X; R). Guerrero-Cusumano (1996a) derived the 
corresponding moment generating function, mean, variance, and some 
asymptotics. Specifically, 


mo = wen (p48) (38) [oH] G) 


and 


e z y= B- pios {rip - v) 
+ N(0,p) 


as n ~— oo, where w(-) denotes the digamma function. Note that 


9.4 Entropy and Kullback-Leibler Number 205 


H(X; R) = H(X; A) — E(6) is an unbiased estimator for H(X; R) 
with E(H,,) = 0 and Var(H,,) = Var(5). Also note that, as v > 00, 


i n-i 
E(6) > aA) + plog 2, 


coinciding with the result given in Ahmed and Gokhale (1989) for the 
multivariate normal distribution. The expression for Var(ô) given above 
is also valid for the multivariate normal distribution since it is indepen- 
dent of v. 

The Kullback-Leibler number for the central p-variate ¢ is given by 
(1.31). The corresponding maximum likelihood estimator for a sample 


of n observations is 
A p y—2 
- =!) s 
a 2 og ( v ) 


Thus, the sampling quantity of interest is the difference 6 = T(X;R) — 
T(X; R). Guerrero-Cusumano (1996b, 1998) derived the corresponding 
moment generating function, cumulant generating function, cumulants, 
mean, variance, and some asymptotics. Specifically, 


mo = (Fa) oa") 
BERO 
Ko = Foe (55) Hoer (=) -1r (3) 


+ {leer ==) — logIT (A) te 


i=l 


T(X;R) = o- Flog 


Var(5) = 3 {po (=) y0 (=) l l 


206 Sampling Distributions 


and 
y+p 


2 

6 > X(v-+p)? 

as n — oo and v > oo. Furthermore, 
(n-1)T(X;R) > Xpp-1)/2 


and 


tr (B?) — 
vn—-15 > N (==) 
as n — oo, where B = Aaa? with A4 denoting the diagonal 
matrix of A. In the latter limit, it is assumed that v is known. When 
v is unknown, the limit still holds for n sufficiently large. The exact 
distribution of ô is quite complicated to obtain in a closed form. 


10 


Estimation 


The material in this chapter is of special interest to researchers attempt- 
ing to model various phenomena based on multivariate t distributions. 
We shall start with a popular result in the bivariate case. 


10.1 Tiku and Kambo’s Estimation Procedure 


In Chapter 4, we studied a bivariate t distribution due to Tiku and 
Kambo (1992) given by the joint pdf 


f(x x ) ied 1 14+ (£2 9)? + 
re: = 0102 k(1- p?) kož 


x exp a {a ~ m — pS (22 - m} ; 
207 (1 — p?) o2 
(10.1) 
Here, we discuss estimation of the parameters ji, H2, 01, 02, and p when 
v is known. The method for estimating the location and scale parameters 
developed by Tiku and Suresh (1992) is used for this problem. For a 
random sample {(X1i,X2i),4 = 1,...,n} from (10.1), the likelihood 
function is 


alm X21) — on 
L x {0203 (1—-p”)} PTT {1+ mh 


i=1 


1 n po: 2 
M ie oe 
x exp | 202 al = p?) — { [1:2] Hı o2 ( (2: ) m) } 


where k = 2v — 3, Xizi), i = 1,..., n are the order statistics of X2; and 
Xini i = 1,...,n are the corresponding concomitant X, observations. 
Consider the following three situations: 


207 


208 Estimation 


(i) Complete samples are available and v is not too small (v > 3). 

(ii) Complete samples are available but v is small (v < 3). 

(iii) A few smallest or a few largest X2; observations and the corre- 
sponding concomitant. Xj: are censored due to the constraints 
of an experiment. This situation arises in numerous practical sit- 
uations. In a time mortality experiment, for example, n mice are 
inoculated with a uniform culture of human tuberculosis. What 
is recorded is X2;: the time to death of the first A(< n) mice, 
and X,;: the corresponding weights at the time of death. 


These situations also arise in the context, of ranking and selection (David, 
1982). We provide some details of the inference for situation (i) as 
described in Tiku and Kambo (1992). Using a linear approximation of 
the likelihood based on the expected values of order statistics, it is shown 
that the maximum likelihood estimators are ' 


~ uaa PO) / 
Į = £1 — = (Z2 — p2), 
02 
s T 
a ES 2 12 2 
ju = sit 2 (3- | 
2 \ 82 
fe = 2- = (ūīı —p), 
2 72 
s G 
Be 2 12 1 
m2 ē = 2 + 2 2 1}, 
Si Si 
and 
a _ $1202 
= 2 
$5 01 


where (Z1, 2) are the usual sample means, (s?, s2) are the usual sample 
variances, and $2 is the sample covariance. The estimators f1, fi2, G1, 
G2, and f are asymptotically unbiased and minimum variance bound 
estimators. The estimator G? is always real and positive while the es- 
timator p always assumes values between —1 and 1. The asymptotic 
variances and covariances of the estimators can be written as 


_ fv 0 
v= (ow) 


10.1 Tiku and Kambo’s Estimation Procedure 209 


where 
y 1 o? Poo 2mv —nk / poa?o? poyo? 
ı = >- - > 
n \ pozoi o2 2vmno? poło: Os 


(10.2) 


is positive definite and is the asymptotic variance-covariance matrix of 
(i, fiz) while 


o? po\o2 po; (1 = P) 


Vo = ES p01 02 o3 po2 (1 a P) 
pai (1- ø) po2(1—p?) 2(1-#) 
5 prot p°o\02 Po (1- ø) 
mO p°0102 o3 por (1 — p°) 
n < 
Pa(l-P) po2(1—p?) P-e? 


is positive definite and is the asymptotic variance-covariance matrix of 
(G1, G2, p). The parameters m and 6 are determined by the linear 
approximation of the likelihood. Interestingly, Var(fi1) and Var(fiz) 
decrease with increasing p? unless v = œo. The first component on the 
right of (10.2) is the variance-covariance matrix of f1, and fiz under bi- 
variate normality, and the first component on the right of (10.3) is the 
asymptotic variance-covariance matrix of ĉ1, G2, and p under bivariate 
normality. The second components in (10.2) and (10.3) represent the 
effect of nonnormality due to the family (10.1). The asymptotic distri- 
bution of Vn (fı — u1, #2 — u2) is bivariate normal with zero means and 
variance-covariance matrix nV,. For testing Ho : (41, #2) = (0, 0) versus 
Hy : (11, #2) # (0,0), a useful statistic is T2 = (f1, fo)’ Vz" (fh, f2), 
the asymptotic null distribution of which is chi-squared with degrees 
of freedom 2. The asymptotic nonnull distribution is noncentral chi- 
squared with degrees of freedom 2 and noncentrality parameter 


2mv 2 
T (BE) (BY. 
kn 02 
where 


= = ral) OC 


Note that Às is the noncentrality parameter of the asymptotic nonnull 
distribution of the Hotelling’s T? statistic based on the sample means 


210 Estimation 


(£1, £2), sample variances (s?, s3), and the sample correlation coefficient 
P = $12/(s1S2). Tiku and Kambo (1992) also provided evidence to the 
fact that the use of T? in place of the Hotelling’s T? statistic can result 
in a substantial gain in power. 


10.2 ML Estimation via EM Algorithm 


Consider fitting a p-variate ¢ distribution to data x,,...,X, with the 
log-likelihood function 


n 
buno = b igRie 22S togwv+s:), (10.4) 
2 2 i=l 


where si = (x — #)7R-!(x — u) and v is assumed to be fixed. Differen- 
tiating (10.4) with respect to u and R leads to the estimating equations 


u = ave {w;x;} /ave {wi} (10.5) 
and 
R = ave fwi (x — p) (x - u} ; (10.6) 


where w; = (v + p)/(v + s:) and “ave” stands for the arithmetic av- 
erage over i = 1,2,...,n. Note that equations (10.5)-(10.6) can be 
viewed as an adaptively weighted sample mean and sample covariance 
matrix where the weights depend on the Mahalanobis distance between 
x; and u. The weight function w(s) = (v + p)/(v + s), where s = 
(x — w)7R-!(x — u), is a decreasing function of s, so that the out- 
lying observations are downweighted. Maronna (1976) proved, under 
general assumptions, the existence, uniqueness, consistency, and asymp- 
totic normality of the solutions of (10.5)-(10.6). For instance, if there 
exists a > 0 such that, for every hyperplane H, Pr(H) < p/(v +p) —a, 
then (10.5)-(10.6) has a unique solution. Also, every solution satisfies 
the consistency property that limpsoo(fi, R) = (u,R) with probability 
1. 

The standard approach for solving (10.5)-(10.6) for u and R is the 
popular EM algorithm because of its simplicity and stable convergence 
(Dempster et al., 1977; Wu, 1983). The EM algorithm takes the form of 
iterative updates of (10.5)-(10.6), using the current estimates of and 
R to generate the weights. The iterations take the form 


pm) = ave fux} Jave {wi} 


10.2 ML Estimation via EM Algorithm 211 


and 
R+) = ave jum (xi e pin) (x: a po)" } 


where 


w™ = (v +n / fv + (xi - yon) (Rim) (xi - wm} 3 


This is known as the direct EM algorithm and is valid for any v > 0. 
For details of this algorithm see the pioneering papers of Dempster et 
al. (1977, 1980), Rubin (1983), and Little and Rubin (1987). Several 
variants of the above have been proposed in the literature, as summa- 
rized in the table below. 


Algorithm Primary References 


Extended EM Kent et al. (1994), Arsian et al. (1995) 
Restricted EM Arsian et al. (1995) 
MC-ECM1 Liu and Rubin (1995) 
MC-ECM2 Liu and Rubin (1995), Meng and van Dyk (1997) 
ECME1 Liu and Rubin (1995), Liu (1997) 
ECME2 Liu and Rubin (1995) 
ECME3 Liu and Rubin (1995) 
ECME4 Liu and Rubin (1995) 
ECME5 Liu (1997) 
PXEM Liu et al. (1998) 


, Meng and van Dyk (1997) 
, Liu (1997) 


Consider the maximum likelihood (ML) estimation for a g-component 
mixture of ¢ distributions given by 


g 
f (x; P) = Y rif (x; Hi Rivi), 
i=1 
where 
T ((vi + p) /2) 
; p Ri, i m ena ae LD 
FesmoRew) = PPT (4/2) R 


a —(vitp)/2 
as (x — w;)" R; "(x — 1) 


Vi 


212 Estimation 


= (Ti, .,Tg-1,0",VT)T, 0 = (Gh Ray jh (Hg, Ray) and v = 
(vi... Vg)”. The application of the EM algorithm for this model in a 
clustering context has been considered by McLachlan and Peel (1998) 
and Peel and McLachlan (2000). The iteration updates now take the 
form 


pint) = > (m) ul ac, Sar 


and 
T n 
REH m (m m+1 m+1 m 
Ym Seeded al) (y= ale)” Soa 
j=l 
where 
ul) = vy” +p 
ij n T =4, 
A) 4 (x; - a!) R™ (x; -a™) 
and 


(m) — 


nl”) f (5500, RO, vf”) 
P= amy 


The EMMIX program of McLachlan et al. (1999) for the fitting of nor- 
mal mixture models has an option that implements the above procedure 
for the fitting of mixtures of t-components. The program automatically 
generates a selection of starting values for the fitting if they are not 
provided by the user. The user only has to provide the data set, the 
restrictions on the component-covariance matrices (equal, unequal, di- 
agonal), the extent of the selection of the initial groupings to be used to 
determine the starting values, and the number of components that are 
to be fitted. The program is available from the software archive StatLib 
or from Professor Peel’s homepage at the Web site address 


http://www.maths.uq.edu.au/~gjm/ 


10.3 Missing Data Imputation 


When a data set contains missing values, multiple imputation for missing 
data (Rubin, 1987) appears to be an ideal technique. Most importantly, 
it allows for valid statistical inferences. In contrast, any single impu- 
tation method, such as filling in the missing values with either their 


10.3 Missing Data Imputation 213 


marginal means or their predicted values from linear regression, typi- 
cally leads to biased estimates of parameters and thereby often to an 
invalid inference (Rubin, 1987, pages 11-15). 


The multivariate normal distribution has been a popular statistical 
model in practice for rectangular continuous data sets. To impute the 
missing values in an incomplete normal data set, Rubin and Schafer 
(1990) (see also Schafer, 1997, and Liu, 1993) proposed an efficient 
method, called monotone data augmentation (MDA), and implemented 
it using the factorized likelihood approach. A more efficient technique to 
implement the MDA than the factorized likelihood approach is provided 
by Liu (1993) using Bartlett’s decomposition, which is the extension of 
the Bayesian version of Bartlett’s decomposition of the Wishart distribu- 
tion with complete rectangular normal data to the case with monotone 
ignorable missing data. 


When a rectangular continuous data set appears to have longer tails 
than the normal distribution, or it contains some values that are influen- 
tial for statistical inferences with the normal distribution, the multivari- 
ate t distribution becomes useful for multiple imputation as an alterna- 
tive to the multivariate normal distribution. First, when the data have 
longer tails than the normal distribution, the multiply imputed data 
sets using the ¢ distribution allow more valid statistical inferences than 
those using the normal distribution with some “influential” observations 
deleted. Second, it is well known that the ¢ distribution is widely used 
in applied statistics for robust statistical inferences. Therefore, when an 
incomplete data set contains some influential values or outliers, the ¢ dis- 
tribution allows for a robust multiple imputation method. Furthermore, 
the multiple imputation appears to be more useful than the asymptotic 
method of inference since the likelihood functions of the parameters of 
the ¢ distribution given the observed data can have multiple modes. For 
a complete description of the MDA using the multivariate t distribution, 
see Liu (1995). See also Liu (1996) for extensions in two aspects, includ- 
ing covariates in the multivariate ¢ models (as in Liu and Rubin, 1995), 
and replacing the multivariate t distribution with a more general class 
of distributions, that is, the class of normal/independent distributions 
(as in Lange and Sinsheimer, 1993). These extensions provide a flexi- 
ble class of models for robust multivariate linear regression and multiple 
imputation. Liu (1996) described methods to implement the MDA for 
these models with fully observed predictor variables and possible missing 
values from outcome variables. 


214 Estimation 


10.4 Laplacian T-Approximation 


The Laplacian T-approximation (Sun et al., 1996) is a useful tool for 
Bayesian inferences for variance component models. Let p(@ | y) be the 
posterior pdf of 0 = (6),...,0,)7 given data y, and let 7 = g(0) be 
the parameter of interest. Leonard et al. (1994) introduced a Laplacian 
T-approximation for the marginal posterior of 7 of the form 


p* (nly) «x T, p(Only) APF (njw, 03, T) (10.7) 


to be the marginal posterior pdf of 7, where 
Qn, 


w 
T ——— 
a" (w+ p)Ay 
T aiy 
5 ont, 
7 w+p-1’ 
21,17 
7 w+p-1’ 


dlogp (8 |y) 
066-0, 


3” logp (8 | y) 

T $ 

a (007) 0-0, 
ee s 0n +Q7'1,; 


n 


and f(n | w,@),T,) denotes the pdf of n = g(@) when @ possesses 
a multivariate t distribution with mean vector 0%, covariance matrix 
T,,, and degrees of freedom w. Here, 0, represents some convenient 
approximation to the conditional posterior mean vector of 0, given 7, and 
w should be taken to roughly approximate the degrees of freedom of a 
generalized multivariate T-approximation to the conditional distribution 
of 0 given 7. 

When @,, is the conditional posterior mode vector of 0, given 7, (10.7) 
reduces to the Laplacian approximation introduced by Leonard (1982) 
and shown by Tierney and Kadane (1986) and Leonard et al. (1989) 
to possess saddlepoint accuracy as well as an excellent finite-sample ac- 
curacy, in many special cases. It was previously used for hierarchical 
models by Kass and Steffey (1989). 


10.5 Sutradhar’s Score Test 215 


In the special case where 7 = a’ is a linear combination of the 6’s, 
the approximation (10.7) is equivalent to 


-1/ lw š -1 
p (nly) o (En p (Only) Az Ot, (w,aT0;, (aTa) ~>), 


where t„(w, u, T) denotes a generalized t pdf. 


10.5 Sutradhar’s Score Test 
Consider a random sample X,,...,X, from a p-variate t distribution 
with the pdf 
(v — 2) PT ((v + p)/2) 


f (x;) = nT (v/2) R 


—(v+p)/2 
x [v - 2+ (xj — Ww) R (x; - »)| : 
Note this is a slight reparameterization of the usual t pdf. The log- 
likelihood 


G = X log f (x;) 


j=l 


is a function of the parameters R., p, and v. 

Frequently in social sciences, and particularly in factor analysis, one 
of the main inference problems is to test the null hypothesis R = Ro 
when p and v are known. Sutradhar (1993) proposed Neymann’s (1959) 
score test for this test for large n. Le r = (ri1,.--,Tht,---,Tpp)? be the 
p(p+1)/2 x1 vector formed by stacking the distinct elements of R, with 
rp. being the (h,l)th element of the p x p matrix R. Also let 


T An 
(A15. -3 Ais- -s Ap(p+1)/2) = b (ro, A, D) 
and 
r _ [ETAD 
E Bence 


where b(ro, R, 7), €(ro, A, 0), and (ro, R,P) are the score functions ob- 
tained under the null hypothesis r = ro, by replacing p and v with their 
consistent estimates f and V in 


OG 


= (10.8) 


b (ro; B, D) 


216 Estimation 


rae ðG 
ro, R, P) = —, 10.9 
and 
AA ðG 
nto ĝ,®) = =, (10.10) 
respectively. Furthermore, let T;(ro, A, D) = À; — Pii Bij Yi, where Bij 


is the partial regression coefficient of À; on y;. Then, Neyman’s partial 
score test statistic is given by 


W (Hi, 0) 
wee ee -1 ee -1 
~ ~ en M M M 
= T” |My - (Mi2Ms) ( 4 T ) ( RT ) T, 
33 13 


(10.11) 


where T = [T; (ro, A, D), - - . , To(p+1)/2 (£0, A, D))T for i,r = 1,2,3; Mir 
are obtained from M;, = E(—Dj,) by replacing p and v with their 
consistent estimates; and D;, for i,r = 1,2,3 are the derivatives given 


by 


8G 
Du = grar” 

8G 
Di2 E ðrðw’? 

8G 
Dis = Bray’ 

8G 
Da anon” 

0G 
D man 
23 Opov’ 

and 

OG 

D33 = pe 


Under the null hypothesis r = ro, the test statistic W(#,7) has an 
approximate chi-squared distribution with degrees of freedom p(p + 
1)/2. The test based on (10.11) is asymptotically locally most powerful. 
Clearly the implementation of this test requires consistent estimates of 


10.5 Sutradhar’s Score Test 217 


ji, D as well as expressions for the score functions and the information 
matrix. The maximum likelihood estimates of yz and v are obtained by 
simultaneously solving 


oan, [Soa 
j=1 j=l 
and 


n (ñ, ro, P) = 0, 


where q; = 0 — 2 + (X; — fi)’ Ro(X; — f) and Ro is specified by the 
null hypothesis. The omen estimates of yz and v (which also turn out 
to be consistent) are 


j=1 
and 
_ 2{2Ba ~ f (ro)} 
7 Bo — f (ro) 
where 
a 1 
Ê = = HES TR (X -3| 


j=l 


is a consistent estimator of the multivariate measure of skewness (see, 
for example, Mardia, 1970b), and 


P 


BD 08) {ADP + D> O N 


h=1 hh! 


where 7°), and rbh denote the (h, h')th element of Ro and Rj", respec- 
tively. 


10.5.1 Score Functions 
The score functions defined in (10.8), (10.9), and (10.10) are given by 


R 1 aS i 
blr, ñ, D) = -3 ryt oom Sata h 


j=l 


218 Estimation 


E (r, f, D) 


n 
w+pR Y g" (X; - n), 
j=l 
and 


iBS legee -3+ (F) -r G) 


-> Jer» yoo +E , 


ari m 


respectively, where 7)(-) denotes the digamma function and q; is a non- 
homogeneous quadratic form given by q; = v — 2 + trR-!B; with B; = 
(X; - u)(X; - p)”. 


10.5.2 Information Matriz 
By taking the second derivatives and then applying expectations, one 


can derive the elements of the information matrix. The first element 
takes the complicated form 


Mi; = [m*(1,1),m*(1,2),...,m*(h,l),...,m*(p,p)], 


where, for l > h, h,l = 1,...,p, m*(h,1) is the p(p + 1)/2-dimensional 
vector, formed by stacking the distinct elements of the p x p symmetric 
matrix 


+ 
Mh. = > [h ® (r')"] = ny +P) p) R Qa, R}. 


Here, r* denotes the kth column of the R`! matrix, and the (u, v)th 
element of the p x p matrix Qn, is given by 


(v + 2)? S YE rhit (r Tuv + Tiufkv +TivfTku) 
2z en SE tS eee ; alk wT ke 
(v+4)2(v+ p)\(v +p 2) ES ikuv iuf kv ivfku) 
where r™° and rms denote the (m, s)th element of R! and R, respec- 
tively. The second element of the information matrix Mj» is zero. The 
third element My3 is formed by stacking the distinct elements of the 
symmetric matrix 


n(p + 2) -1 
(v—2)(v+p)(v+p+t 2) 


10.6 Multivariate t Model 219 


The remaining elements of the information matrix are given by 
nu(v +p) 


My» = —~——_R"}, 
5 (v=2)(v+p+2) 


and 


- 1 ,fvt+p 1 ,/v v—4 
M= n [iw ( 2 j- 7 (G) -ai 
1 nv (v? + vp — 6p — 2v — 8) 
2 (v—2)?(v + p)\(v +p+2)- 


10.6 Multivariate t Model 


Consider the following multivariate t model described in equation (9.2) 
of the preceding chapter 


f (x1,.--,Xn) 
T ((v + p)/2) 
(nny)??? T (v/2) [RI 
12 —(v+np)/2 
x [1 += Y (xi — p) RO (xi — p) . (10.12) 


i=l 


In this section, we consider estimation issues associated with the corre- 
lation matrix R and its trace tr(R). 


10.6.1 Estimation of R 


Joarder and Ali (1997) developed estimators of R (when the mean vector 
H is unknown) under the entropy loss function 


L(u(A),R) = tr(R7'u(A)) —log [Ru (A)| — p, 


where u(A) is any estimator of R based on the Wishart matrix A defined 
in (9.4). Based on the form of the likelihood function, the entropy loss 
function has been suggested in the literature by James and Stein (1961) 
and is sometimes known as the Stein loss function. Some important 
features of the entropy loss function are that it is zero if the estimator 
u(A) equals the parameter R, positive when u(A) 4 R, and invariant 
under translation as well as under a natural group of transformations of 
covariance matrices. Moreover, the loss function approaches infinity as 


220 Estimation 


the estimator approaches a singular matrix or when one or more elements 
(or one or more latent roots) of the estimator approaches infinity. This 
means that gross underestimation is penalized just as heavily as gross 
overestimation. 

In estimating R by u(A), Joarder and Ali (1997) considered the risk 
function R(u(A),R) = E[L(u(A),R)]. An estimator ue(A) of R will 
be said to dominate another estimator u;(A) of R if, for all R belonging 
to the class of positive definite matrices, the inequality R(u2(A),R) < 
R(u1(A), R) holds and the inequality R(u2(A), R) < R(ui(A), R) holds 
for at least one R. 

Joarder and Ali (1997) obtained three estimators for R, by minimizing 
the risk function of the entropy loss function among three classes of 
estimators. 


e First, it is shown that the unbiased estimator R = (v — 2)A/(vn) has 


the smallest risk among the class of estimators of the form cA, where 
c > 0, and the corresponding minimum risk is given by 


R(R, R) = plogn =a Ele (x2.44-:)] + plog (=) 
~—2pE (logr), 


where 7 has the inverted gamma distribution given by (9.3). 


Second, the estimator R* = TD*T’, where T is a lower triangular 
matrix such that A = TTT and D* = diag(d},...,d}) with dj defined 
by 
-2 1 
d; = ema 
v nt+tp+1—2 


has the smallest risk among the class of estimators TATT, where A 
belongs to the class of all positive definite diagonal matrices and the 
corresponding minimum risk function of R* is given by 


p p 
R(R*,R) = Ñ log(in+1+p—~2i)— >> E [log (xĝ41-:)] 
i=1 i=l 

+p log (=) — 2pE (log 7), 


where 7 is as defined above. Furthermore, R* dominates the unbiased 
estimator R = (v — 2)A / (vn). 


10.6 Multivariate t Model 221 


e Finally, consider the estimator R= S¢(M)S, where A has the spec- 
tral decomposition A = SMS’, with ¢(M) = D*M. Then the esti- 
mator R = SD*MS? dominates the estimator R* = TD*T’. 


10.6.2 Estimation of tr{(R) 


Let 6 = tr(R) denote the trace of R. Joarder (1995) considered the 
estimation of ô for the multivariate t model under a squared error loss 
function following Dey (1988). The usual estimator of ô is given by 
ð= cotr(A), where co is a known positive constant and A is the Wishart 
matrix defined in (9.4). The estimator 6 defines an unbiased estimator 
of ô for co = (v — 2)/(vn) and a maximum likelihood estimator of 6 for 
co = 1/(n + 1) (see, for example, Fang and Anderson, 1990, page 208). 
Joarder and Singh (1997) proposed an improved estimator of ô — based 
on a power transformation — given by 


5 = eptr(A)+coc {piaj -tr a} ; 


where co is a known positive constant and c is a constant chosen so that 
the mean square error (MSE) of ô is minimized. Calculations show that 


MSE (ô) = MSE (F) + cfi + èp, 
where 
A = 28E |(cotr (A) - ô) (pial -tr(A))] (10.13) 
and 


b = &B|play/? -tr(A)]. (10.14) 


Thus MS E(6) is minimized at c = —81/(282) and the minimum value is 
given by MSE(6 6) - B? /(4B2). This proves that 6 is always better than 
the usual estimator in the sense of having a smaller MSE. The estimate of 
cis given by ê= —/y / (28>), where ĝ, and ĝ are obtained by calculating 
the expectations in (10.13) and (10.14) using the numerous properties 
given in Section 9.1 and then replacing R by the usual estimator co A. 
It can be noted from Fang and Anderson (1990, page 208) that the 
estimators KA and Bo are the maximum likelihood estimators of 8, and 
b2, respectively, provided R. = coA and co = 1/(n + 1). 

The following table taken from Joarder and Singh (1997) presents the 
percent relative efficiency of 6 over 6. 


222 Estimation 


v Re=diag(1,1,1) R = diag(4,2,1) R =diag(25,1,1) 


5 105.32 130.31 153.90 
10 102.13 117.56 148.76 
15 101.53 112.07 127.15 


The numbers are from a Monte Carlo study carried out by generating 
100 Wishart matrices from the multivariate t-model with n = 25 and 
p=3. 


10.7 Generalized Multivariate t Model 


Consider the generalized multivariate t model (9.7) discussed in the pre- 
ceding chapter. The usual estimator of R is a multiple of the Wishart. 
matrix of the form R = co A, where cg > 0. Joarder and Ahmed (1998) 
proposed improved estimates for R as well as its trace and inverse under 
the quadratic loss function. The proposed estimators for R are 


R = œA -caj I, (10.15) 


where I is an identity matrix and c is chosen such that R is positive 
definite. For an estimator R* of R, let L(R*, R) = tr[(R* — R)?] denote 
the quadratic loss function and let R(R*,R) = EL(R*,R) denote the 
corresponding risk function. The relationship between R and R is rather 
involved. Defining the dominance of one estimator over another in the 
same manner as in Section 10.6.1, Joarder and Ahmed (1998) established 
that R dominates R for any c Satisfying d < c < 0, where 


= np +2 Dp ((n — 1)/2 + 1/p) 
nS (a ~ o ou 


with co < py/((n — 1)p + 2) or 0 < c < d, where d is given by (10.16) 
with co > py/(np+ 2) and y by y = %2/%4 and yi = E(t"), i = 1,2,3,4. 
The risk functions of the two estimators are given by 


à I, (n/2 + 2/p) Jp dtr (R/p) 
a (Rem) = ame RE (e ie) 
+ {1 + (n — 1)coy (con — 27)} tr (R?) 
+(n — leey, (trR)? 


and 


R (R, R) = {1+(n—1)cov (con — 27)} tr (R?) 


10.8 Simulation 223 
+(n — 1) (trR)’ , 


respectively. Now consider estimating the trace 6 = trR. The usual and 
the proposed estimators are § = cotrA and 6 = cotrA — cp | A |}/P, 
respectively, where co > 0 and c is such that the proposed estimator is 
positive. Joarder and Ahmed (1998) established that the corresponding 
risk functions are given by 


R (5, ô) = [(n—1)eo {(n — 1oy — 272} + 118? 
+2(n — 1)cãłyatr (R?) 


and 


OEE o mh (-— 


respectively. It is evident that § dominates 6. Finally, consider estimat- 
ing the inverse P = R! with the usual and the proposed estimators 
given by © = coA7! and È = AW! — co | A |71? I, respectively, 
where co > 0 and c is such that the proposed estimator is positive defi- 
nite. In this case, it turns out that Ẹ dominates © for any c satisfying 
d<c<0, where 


7 Co _ Y-2\ Tp ((n = 1)/2 - 1/p) 
eS eas oy ACER EEETA oa, 


with co < (n — 2/p — p — 2)7_2/y-4 or 0 < c < d, where d is given by 
(10.17) with co > (n — 2/p — p — 2)y-2/7-4 and q; = E(t). 


10.8 Simulation 


Simulation is a key element in modern statistical theory and applica- 
tions. In this section, we describe three known approaches for simulat- 
ing from multivariate ¢ distributions. Undoubtedly, many other methods 
will be proposed and elaborated in the near future. 


10.8.1 Vale and Maureli’s Method 


Fleishman (1978) noted that the real-world distributions of (univariate) 
variables are typically characterized by their first four moments (that 
is, mean, variance, skewness, and kurtosis). He presented a procedure 
for generating nonnormal random numbers with these four moments 
specified. He accomplished this by taking a nonnormal variable X as a 


224 Estimation 


linear combination of the first three powers of a standard normal random 
variable Z 


X = a4bZ4+cZ?+dZ'. (10.18) 


To determine the constants, Fleishman expanded (10.18) to express the 
first four moments of X in terms of the first 14 moments of Z. After 
considerable algebraic manipulation, Fleishman was able to represent 
the solution to the constants of (10.18) as a system of nonlinear equa- 
tions. For a standard distribution (that is, with mean zero and variance 
one), the constants b, c, and d are found by simultaneously solving the 
following equations 


b + 6bd+2c? + 15d” = 1, (10.19) 


2c (b? + 24bd+ 105d? +2) = mn, (10.20) 
and 


24 {bd +c? (1+ 6” + 28bd) + d? (12 + 48bd + 141c? + 225d”)} = %2, 
(10.21) 


where yı is the desired skewness and 72 is the desired kurtosis. The 
constant a in (10.18) is determined by 


a= =c. (10.22) 


Univariate nonnormal random numbers are then generated by drawing 
normal random numbers and transforming them using the constants a, 
b, c, and d in (10.18). 

Vale and Maureli (1983) extended Fleishman’s procedure for multi- 
variate nonnormal distributions with specified intercorrelations as well 
as specified moments. The procedure begins by specifying the constants 
necessary for Fleishman’s procedure. For each variable independently, 
these are given by the solution of (10.19)-(10.22). Define two variables 
Zı and Z as from standard normal populations, and define the vec- 
tor z as z7 = [1, Z, Z?, Z3]. The weight vector w? contains the power 
function weights a,b,c, and d: wT = [a,b,c,d]. The nonnormal variable 
X then becomes X = wTz. If rx,x, denotes the correlation between 
two nonnormal variables X and X> corresponding to the normal vari- 
ables Zı and Zo, it is then easily seen that rx,x, = w? Rw2, where 


Xı = wiz, X2 = wiz and R is the expected matrix product of zı 


10.8 Simulation 225 


and 23 given by 


1 0 1 0 
R = 0 TZ, Zo 0 3r Z Z2 
1 0 2r% z +1 0 
0 3rzz 0 6r3 z, + az 


Collecting the terms and using (10.22), a third-degree polynomial in 
TZ,Zz2, the correlation between the normal variables Z, and Z2, results 


TX X2 = ZZ, (bib2 + 3b, dy + 3d,b2 + 9d) d2) + 2c1eary, z, 
+6d, dork, z,- 


Solving this polynomial for rz,z, provides the correlation required to 
obtain the desired post-transformation correlation rx, x. These corre- 
lations can then be assembled into a matrix of intercorrelations, and this 
matrix can be decomposed to yield multivariate normal random numbers 
for input into Fleishman’s transformation procedure. 


10.8.2 Vaduva’s Method 


Vaduva (1985) provided a general algorithm for generating from mul- 
tivariate distributions and illustrated its applicability for multivariate 
normal, Dirichlet, and multivariate ¢ distributions. Here, we present a 
specialized version of the algorithm for generating the p-variate ¢ distri- 
bution with the joint pdf 
T 2 1 —(v+p)/2 
f(x) = a US TE. [i+ ETR] 
(xv)P/?T (v/2) |R] . 

over some domain D in RP. It is as follows 

(i) Initialize. 

(ii) Determine an interval I = [u§, vg] x --- x [up, vp], where 


Uo = 0, 
py =~, ds 
1 
v? = Aia ) i=l, P, 
v-—1 
and 
1 
v} — Or ) =1,. »P 


226 Estimation 


(iii) Generate the random vector V* uniformly distributed over I. If 
RND is a uniform random number generator, then V* may be 
generated as follows 


(a) Generate Uo,Ui,...,Up uniformly distributed over [0, 1] 
and stochastically independent. 
(b) Calculate V = v? + (v} — v?)Ui, i = 0,1,...,p. 
(c) Take V* = (VŠ, Vi*,...,V)). 
(iv) If V* ¢ D, then go to step (iii). 
(v) Otherwise, take V = V*. 
(vi) Calculate Y; = V;/Vo, i =1,...,p. 
(vii) Take X = (Yi,...,¥p)7. Stop. 


Note that the steps from (iii) to (v) constitute a rejection algorithm. 
The performance of this algorithm is characterized by the probability to 
accept V*. This probability can be calculated in the form 


z nP/2T (v/2) (: = 
Po D+ DE (V+ p)/2 PF ” 


which yields 


lim pa = 0 
vo 

and 
lim pa = 0, 
p00 


indicating inadequate behavior of the algorithm for large values of p 
and/or v. 


10.8.3 Simulation Using BUGS 


A relatively simple way to generate a multivariate ¢ involves a sampling 
of z from gamma(v/2, v/2) and then sampling a multivariate normal 
y ~ N,(u,R/z). This mode of generation reflects the scale mixture 
form of the multivariate ¢ pdf. In BUGS the multivariate normal is pa- 
rameterized by the precision matrix P; thus one programs a multivariate 
t pdf as follows to generate a sample of n cases (for Sigmal,], nu.2 and 
mu[] known) 


for (i in i:n) 
{z[i] ~ dgamma(nu.2, nu. 2) 
yli, 1:q] ~ dmnorm(mu, P.sc[,])} 


10.8 Simulation 227 


for (i in 1:q) {for (j in 1:q) 
{P[i, j] <- inverse(Sigmal[,], i, j) 
P.scli, j] <- z[i] * PLi, j]}} 


If one has observed multivariate data and wishes to assume multivari- 
ate t sampling, then in BUGS the dmt() form is available 


for (i in i:n) {y[i, 1:q] ~ dmt(mu[], P[,], nu)} 


where nu is assumed known. 


11 


Regression Models 


There is a large number of contributions (scattered in the literature and 
many of them motivated by economic applications) dealing with regres- 
sion models with the error term distributed according to the multivariate 
t distribution. In this chapter, we shall discuss several of them. 


11.1 Classical Linear Model 


Let the model for n observations Y = (y1,...,Yn)7 be 
Y = XBte, (11.1) 


where X is an n x p design matrix with rank p, 8 is a p x 1 vector of 
regression parameters with unknown values, and € is an n x 1 random 
error vector. For the usual t regression model it is assumed that the n 
elements of e have the multivariate t pdf 


_ vPD(nt+v)/2) [hep Or? 
fle) = nr/2gnT (v/2) Ç =| ` 


In practice, there are several situations in which the model (11.1) is 
useful. Under (11.1), the least squares estimate of Ø is 


ĝ = (XTX) `’ XTy. (11.2) 


Zellner (1976) noted that this is also the maximum likelihood estimate 
of B. From Singh (1991), 3 is a minimum variance linear unbiased esti- 
mator and also a minimum variance unbiased estimator. The variance- 
covariance matrix for ĝ is 

vo? 


Var (8) =E (4 - 6) (2-6) =- (Te 3) 


228 


11.1 Classical Linear Model 229 


Note that as v —> oo, the above variance approaches (X7X)~1!0?, which 
is the variance-covariance matrix in the normal case. Thus, for small 
and moderate values of v, the variances of the elements of B are inflated 
considerably, as compared to those for large values of v. 
Singh (1988) provided the following estimate of the degrees of freedom 
parameter 
2 (2a — 3) 


— = —— 
a-3 ° 


where 
(1/n) Eia (ye - x78) 
fam Et (w- x78) } 


a= 
and B is the least squares estimator given by (11.2). 
The maximum likelihood estimate of g? is 


oS > (y = xB)" (y ~ XÊ) 
as in the normal case. For v > 2, 


2 
= n — p)o. 
E (6?) = ( p) u 
n 
where o2 = vo?/(v — 2) is the common variance of the elements of e. 
Thus, €’ €/(n — p) is an unbiased estimator for 02 while 
si v-~2 wr. 
= ———2¢ (11.4) 
v(n — p) 
is an unbiased estimator for o?. In the class of estimators qê” €, with 
q being a positive scalar, the minimal mean squared error estimator for 
o? is (with v > 4) 


= —— E E, (11.5) 


while the minimal mean squared error estimator for o2 in this class is 
(v — 4)é7é/{(v — 2)(n — p+2)}. The variances of the unbiased and the 
minimal mean squared error estimators of a? are 

204 n-p+v—2 


230 Regression Models 


and 
(v —4)\(n-p+v-2) , 


Var (3?) z= A rE ; 


(11.7) 
respectively. Since G? is an unbiased estimator for g?, the variance (11.3) 
can be unbiasedly estimated by 
~ (5 -1 €e 
Var (ĝ) = (XTX) En 
n-p 
Similarly, (11.4) and (11.5) can be estimated by 


Fo 2. Soe E 11.8 
(a——p) (11:8) 
and 
= 3 
z ATs 
o (@a—3)(n=pt2) Gee he €, (11.9) 


respectively. The estimates for the variances given by (11.6)-(11.7) are 


pene 2 *2)2 a ape 
Var (67) = (o ) are de 
n-p yp—4 
and 
TE) = an- pE A-r- (02) 
Var (67) = 2(n-p) (D ~ 2)?(n — p+ 2)? G ) , 


respectively, where o*” may be taken as 2 given in (11.8) or as a given 
in (11.9). 

It is important to note that, even though the elements of € have the 
nonnormal pdf and are not independent, tests and intervals based on 
the usual ¢ and F statistics still remain valid. For example, t = (B; — 
Bi) / {smë}, where m* is the (i,i)th element of (K7X)~! and s? = 
ee/ (n-p), has the usual Student’s t distribution, and thus probability 
statements based on this statistic will be appropriate. Also, s?/o? has 
the usual F distribution with degrees of freedom v and n — k. This fact 
can be used to construct confidence intervals for and test hypotheses 
about 0”. 

Singh et al. (1995) proposed the generalized estimator Ê, = g(t)B 
for B, where t = ETE, A xTxB has at least the first k > 6 moments 
finite and g(t), satisfying the validity conditions of Taylor’s series ex- 
pansion and having the first three derivatives with respect to t bounded, 
is a bounded function of t such that g(0) = 1 and g(t) = O(1) as 


11.1 Classical Linear Model 231 


6 = BTXTXB —> oo. It should be noted that the maximum likeli- 
hood estimator @ and the estimators considered by Singh (1991) are 
all particular cases of 6,. Singh et al. (1995) investigated the bias 


p(B.) = E(B, - B)7Q8, — ß)] of the generalized estimator when 
Q is a positive definite matrix. It was established that 


x 7 (n — p)vo?g' (0) vo? (p — 2) 
E (ĝ,) = ß+ TO k- bw — 4) 


ee aN Oleo (=) 


28(v — 4)g' (0) 03 
and 
A vo? -1 (n — pyy’o%g'(0) 
P(B) = Fy {(X7x)" @} + Tae 


x [a {(x7x) Q} + naps e] 


Since p(B, ) < p(B), one observes that B, is dominant over the maxi- 
mum likelihood estimator B. Also in the class B,, there exists better 
estimators than those considered in Singh (1991). 

Sutradhar (1988b) considered testing Ho : CG = 0 versus H; : CB # 
0, using the classical F statistic 


_ W-W 
W = m’ 
where 
—_ EZ Tx)71 yT 
wm = ~—>(In—X(X7X)"X7)¥ 
is the residual sum of squares of the full model (11.1) and 
B TZ\ IZT 
m = (In -2 (z272) z7) ¥ 


is the residual sum of squares of the reduced model 
E(Y) = Za, (11.10) 


which is obtained from (11.1) by using the restriction under Ho. In Ho, 
C is an r x p matrix of known coefficients with rank(C) = q, and, in 
the reduced model (11.10), Z denotes the new design matrix of order 


232 Regression Models 


n x (p — q) and a is a vector of (p — q) parameters. Sutradhar (1988b) 
established that the pdf of W is given by 


Do v q n-p 
f(w) =z yaa 2 Bora (k +1,5 -1) Bo GE: 2 i 
where 
_ T(a+b)227} 
felad = Taro +a 
and 6 is the noncentrality parameter given by 
_ Yr-2a7lyry _ yr T7\-1 oT 
ô = <P" (XTX - x72 (Z7Z) Z"X) B. 


Sutradhar also computed the corresponding power of the test, yielding 
the expression 


2 a ù a 1 


where uo = 1/[1 + (q/(n — p))F;,n-p,a] and Tu (a,b) denotes the incom- 
plete beta function ratio. As v — oo, this expression reduces to the 
power of the F test under normality (Tiku, 1967). 

The distribution of future responses given a set of data from an infor- 
mative experiment is known as a predictive distribution. Haq and Khan 
(1990) derived the predictive distribution for (11.1). Rewrite (11.1) in 
the equivalent form y = BX + ce and let Yp be a future response cor- 
responding to the design vector xy, that is, ys = Oxy + oes. Haq and 
Khan (1990) showed that the predictive pdf of Yp is given by 


f (ys ly) 


x [2 +s~*(y) {yz — Bly) xs} (1— xf A7*xy) {ys — blys} 


’ 


joe 


where b(e) = eX7(XX7)-!, s?(e) = (e — @)(e — @)T, € = b(e)X, and 
A=XXT +x fx}. Thus, for the given informative data y, the predic- 
tive distribution of Y; is t with mean vector b(y)x,, variance-covariance 
matrix (n — p)s?(y)/{(n — p — 2)(1— x} A~'x,)}, and degrees of free- 
dom n — p. A prediction interval of the desired coverage probability can 
easily be obtained by using the standard t-table. Note that the predic- 
tive distribution does not depend on the degrees of freedom parameter 
of the original ¢ distribution. For a set of n’ future responses given by 


11.2 Bayesian Linear Models 233 


Y; = @Xy+0e;, Hag and Khan (1990) noted similarly that the predic- 
tive distribution of y+ is n'-variate t with mean vector b(y)X¥, variance- 
covariance matrix | In, — X;Q-!X, [1/2 s(y) (where Q = XXT + 
X pXF) and degrees of freedom n — p. It is to be noted that the distri- 
bution of (n—p)s~?(y)(Y¢ —b(y) Xs) (In ~X Q Xz) (Vz — W(y) Xz)” 
is F with degrees of freedom n’, and n — p. This distribution can be uti- 
lized for determining the prediction region for a set of future responses 
with any desired coverage probability. 


11.2 Bayesian Linear Models 


In his classical paper, Zellner (1976) provided a Bayesian analysis of the 
linear model (11.1). Consider the diffusion prior for 8 and o”, that is, 


p(B,07) « > (11.11) 


where 0 < g? < oo and ĝ; € R, i =1,...,k. Then, assuming that v is 
known, the posterior pdf of the parameters is 


[vo?/A (B) 
A (B) [1 + vo? /A (BEP) 


where A(8) = (y — X8)T (y — XB). It follows that the conditional 
posterior pdf of 8 given g? and v is in the form of a multivariate t pdf 
with mean ĝ (the least squares estimate in (11.2)). The corresponding 
conditional posterior covariance matrix is given by 


p(8,07|y,v) x TONE 


Var (Bly,o7,v) = va tn ps iz) : 


provided that n ~p+v > 2, where (n — p)s? = (y ~ X8)" (y — XA). As 
v — oo, the conditional posterior pdf for 8 and g? approaches a mul- 
tivariate normal pdf with mean ĝ and covariance matrix (X7X)~!9?, 
which is the usual result for the normal regression model with the diffuse 
prior pdf (11.11). The marginal posterior pdf for £ is 


p(Bly,v) œ fin — p)s? + (8 ~B)" XTX (8 - a)" 
(11.12) 


which is in the form of a p-dimensional t pdf and does not depend on 
the value of v. In fact, (11.12) is precisely the result that one obtains 
in the Bayesian analysis of the normal regression model with the diffuse 


234 Regression Models 


prior for the parameters shown in (11.11). The marginal posterior pdf 
for g? is 


PETE P oma (1+, vee a 


s? n — p)s? 


from which it follows that o?/s? has the F pdf with degrees of freedom 
v and n — p, a result paralleling the classical results mentioned in the 
preceding section. From properties of the F distribution, the modal 
value of o?/s? is ((n — p)/v)((v — 2)/(n — p + 2)), when v > 2 and its 
mean is (n — p)/(n — p — 2) when n — p > 2. Also, as v > ov, the 
posterior distribution of vs?/o? approaches a chi-squared distribution 
with degrees of freedom n — p, a distributional result that holds for the 
Bayesian analysis of the usual normal regression model with diffuse prior 
assumptions. Finally, note that the posterior pdf for na? /(y—XB)? (y— 
XP) is Fyn. 

The natural conjugate prior distribution for ø? and £ is the product 
of the marginal F pdf for gø? times a conditional p-dimensional t pdf for 
B given o”, that is, 


p(B,o7|-) = pr (o?|-) ps (Blo”,-), (11.13) 


where 

2 2\(v—2)/2 
Elen oc We Meta) 
(1+ vo? /vas?) 4 


(where va > 0, Sa > 0, and 0 < ø < œ) and 


, 


zia 7 ag ff 1 a —(2v+p)/2 
ps (B10?,B,A,Pa) œ az” fva + z (B~B)" A (B-B) 


where £; € R, i = 1,...,p, A is symmetric and positive definite, B is 
the prior mean vector, Da = V + Va, and G? = (vas? + vo?)/Da. As 
with the natural conjugate for the usual normal regression model, it is 
seen that 8 and g? are not independent in the natural conjugate prior 
distribution in (11.13). If the natural conjugate prior distribution is 
thought to represent the available prior information adequately, it can 
be used for obtaining the posterior distribution; see the appendix in 
Zellner (1976) for details. 


11.8 Indexed Linear Models 235 


11.3 Indexed Linear Models 


Lange et al. (1989) and Fernandez and Steel (1999) provided a far- 
reaching extension of (11.1) to handle the situation when y;’s are as- 
sumed to have the ¢ distribution with degrees of freedom v; and param- 
eters p = g;(@) and R = h;(@) indexed by some unknown parameters 0 
and @. Lange et al. (1989) suggested an EM algorithm for estimation. 
They also considered methods for computing standard errors, developed 
graphical diagnostic checks, and provided applications to a variety of 
problems. The problems include linear and nonlinear regression, robust 
estimation of the mean and covariance matrix with missing data, un- 
balanced multivariate repeated-measures data, multivariate modeling of 
pedigree data, and multivariate nonlinear regression. They also derived 
the expected information matrix for (8,@,v) for one observation in the 


form 
(er) a v+p dvT -1 OV 
00;00; ~ y +p+2 06; 00; i 
ðlog L v+p ( _, OR 1) 
E = eee ZES So 
EA utor E a Bd; 
ee 9B). (ns) 
a (® AAG ð$) 
ðlogL\ _ 1 _, OR 
P (Spar) E rrera" (F ae 
and 
Olog L = 1 1 “ y+p 1 n" v Pp 
r (5) 7 “ie ( =+) 2 CTS 


where ~ (x) = d? logI'(x)/d2z is the trigamma function. The remaining 
elements of the matrix are zero. 

In an important paper, Fernandez and Steel (1999) revealed some 
pitfalls with a model of the above kind. Under a commonly used non- 
informative prior, they showed that Bayesian inference is precluded for 
certain samples, even though there exists a well-defined conditional dis- 
tribution of the parameters given the observables. They also noted that 
global maximization of the likelihood function is a vacuous exercise since 


236 Regression Models 


the latter becomes unbounded as one tends to the boundary of the 
parameter space. More specifically, let 1[(@,¢,R,v) be the likelihood 
function for n independent observations y; assumed to have the t dis- 
tribution with mean vector g;(@), common covariance matrix 0?R, and 
common degrees of freedom v. For given values of 0 = 6), R. = Ro, and 
v = w, let 0 < s(@9) < n denote the number of observations for which 
yi = 9:(80). Then the following hold 


(a) If 
Th E 
then 
lim 1 (80, 0, Ro, vo) = 300; 
(b) If 
TES ps (80) 
n — s (0o) 
then 
lim 1 (80,0, Ro, vo) € (0,00). 
(c) If 
ae = a 
then 


lim 1 (80, o, Ro, vo) = 0. 
a0 


It is evident from this result that one can determine a value 8o such that 
yi = gi(@o0) holds for at least one observation and the likelihood function 
does not possess a global maximum. Indeed, for sufficiently small values 
of v, one can make 1(@9,¢, Ro, vo) arbitrarily large by letting ø tend to 
zero. These pitfalls arise as a consequence of the (sometimes neglected) 
fact that the recorded data have zero probability under the assumed 
model. Fernandez and Steel (1999) proposed and illustrated a Bayesian 
analysis on the basis of set of observations that takes into account the 
precision with which the data were originally recorded. 


11.4 General Linear Model 237 


11.4 General Linear Model 
Rubin (1983) and Sutradhar and Ali (1986) considered the general linear 
model set up in the form 


Y = PX+e, (11.14) 


where X is a k x n design matrix with rank k, B is a p x k matrix of 
regression parameters with unknown values, and € is a p x n random 
error matrix. It is assumed that the error variables cij satisfy 


E (eij) = 0, Wi,j=1,...,p, 
E (é,) = o? Ag, Vi,j =1,...,p, 
E (eijer) = o? Au, Vi, j l =1,...,p, 


and 
E (ejay) = 0, Wil gj EJ, 


where Aj; are unknown parameters. Furthermore, it is assumed that, 
for a given g, the errors €1,...,€, are independently and normally dis- 
tributed, with the distribution of €; = (€1j,...,€pj)7 being N(0,07A) 
for j = 1,...,n while ø is assumed to be a random variable having an 
inverted gamma distribution with the pdf given by 


2(v/2)" 9? —(v+1) v 
Ju] (v/2)” exp {=z 


where v is an unknown parameter. Under these assumptions, one can 
show that the joint distribution of error variables is 


(v — 2)*P ((v + np)/2) 
TPT (y/2) R? 


n 
x i -2+ Ý ER ej 


j=1 


f(a) 


| —(vtnp)/2 
where R = vA/(v — 2). It then follows that E(e;) = 0, E(eje7) =R 
and E(eje7) = 0 for j #5, js =1,...,n. 

Sutradhar and Ali (1986) provided a least squares estimator for B as 
well as moment estimators for R and v. The least squares estimator is 


Ô = (XXT)xy7 


238 Regression Models 


while the moment estimators are given by 


R= EE (v5-Bx) (9~Bs)" 
j=l 


Pp p 

2 1 
Je. a a2 ~A a2 a4 
p = (5a - DEA) /(spa-irya), 

i=1 i j i=1 i j 
where €;; are the so-called estimated residuals expressed as the difference 
k 
&j = Yj- YO Ginte;- 
r=1 


All three estimators B, R, and P are consistent as n — oo. 

Let Y = (yi,---,¥n)?, where yj = (y1;,---,Ypj)7- Let Y* denote 
the stacked random vector corresponding to Y, so that Y* = (yi, ..., 
Ypi; Y12 +++) Yp2s +++) Yiny +++) Ypn)”- Let B* and e* be the corresponding 
stacked random vectors. Then the model (11.14) can be written in terms 
of Kroneckor products as 


Y* = (1,@X7)p* +e’. (11.15) 
Suppose one wishes to test the hypothesis that Hp : 0* = 65 versus 
H; : 0” # 65. In the case where v and R. are known, Sutradhar and Ali 
(1986) showed that a suitable test statistic is 

v fx T -1)71 /2* 
= -ø xT - 65). 
D — (ô 63) {Re (X ) \ (ô o; 

Lower values of this statistic D will favor Ho while higher values, will 


direct the rejection of Hp. Actually, it can be shown that the pdf of D 
is 


yr/qke/?-) < T ((v + kp)/2 + 23) 


HO = “Poppy Tapti TG +H) 
x (Ad)? (A + v + a) FRPP 25, 
where 
A = <5 (6° - 65)" B~ (6° — 65). 


Note that, under Ho : 6" = 05, D/(kp) has the usual F distribution with 
degrees of freedom kp and v, whereas the analogous test for the classical 


11.5 Nonlinear Models 239 


MANOVA model has the chi-squared distribution with degrees of free- 
dom kp. Also note that the power of the test changes under Ħ,, whereas 
the similar statistic has the noncentral chi-squared distribution for the 
usual normal model. In the case where v and R. are not. known, since 7 


and R are consistent estimators, an F test based on D = DUTU/(P-2), 
~ ~—1/2 a 
vetea O — 6%), may still be approximately valid. 


Little (1988) extended the general linear model (11.14) to handle in- 
complete data. The methods for estimation employed are maximum like- 
lihood (ML) for multivariate t and contaminated normal models. ML 
estimation was achieved by means of the EM algorithm and involves 
minor modifications to the EM algorithm for multivariate normal data. 


11.5 Nonlinear Models 


Nonlinear models involving multivariate t distributed errors have been 
studied relatively recently. Chib et al. (1991) considered nonlinear re- 
gression models with errors that follow the multivariate ¢ distribution 
with degrees of freedom v. For an n x 1 vector of observations y, the 
model is specified by 


y = h(X,8)+e, (11.16) 


where X is an n x r matrix of regressors, 8 is the regression coefficient 
vector, A(X, 8) is a vector function of (X, 8), and € is the error vector. It 
is assumed that € | X,@,7,7,v has an n-variate t distribution with zero 
mean vector, covariance matrix (1/7)V(X,7), and degrees of freedom 
v. On can see that (11.16) reduces to (11.1) simply by setting r = p, 
h(X, 8) = XB, and V(X,7) = In. The sampling density resulting from 
(11.16) is at pdf, which can be represented as the following scale mixture 
of normal pdfs 


f(y[Xw) = | f(y [X,2,w) f (2|K,w) de, 


where f(y | X, z, w) is an n-variate normal pdf with mean vector h(X, 8) 
and covariance matrix 1/(z7)V(X,7) and f(z | X,w) is a gamma pdf 
with parameters (v/2,v/2). Note that the proper pdf, f(z | X,w), is 
independent of X and does not involve parameters other than v. 

In the classical linear model due to Zellner (1976), the marginal pos- 
terior of the regression parameter, G, is unaffected by the multivariate 
t assumption (see Section 11.2). This result was extended by Chib et 


240 Regression Models 


al. (1998), Osiewalski (1991), and Osiewalski and Steel (1990) for el- 
liptically distributed errors. For the nonlinear model above, Chib et 
al. (1991) provided the following sufficient conditions under which the 
posterior of v, p(v | y, X), coincides with the prior, p(v) 


e For proper priors p(w), if v is independent of (G8,7,7z), then v is 
independent of (y, X). 

e For improper priors of the form p(w) = p(r)p(8,7)p(v), where p(T) œ 
1/7, T > O and p(v) is proper and functionally independent of (7, 8,7), 
if the posterior of v exists, then p(y | y, X) = p(v). 


12 
Applications 


Due to limitations on the size of this book and since the aim is to collect 
and organize results on multivariate t distributions, in this short chapter 
we collect and present a small number of relatively recent applications 
of multivariate ¢ distributions. The treatment is by no means exhaus- 
tive. Some other applications — in particular those related to Bayesian 
inference — are mentioned in the previous chapters (see Chapters 1, 3, 
5, 10, and 11). 


12.1 Projection Pursuit 


Exploratory projection pursuit is a technique for finding “interesting” 
low p-dimensional projections of high P-dimensional multivariate data; 
see Jones and Sibson (1987) for an introduction. Typically, projection 
pursuit uses a projection index, a functional computed on a projected 
density (or data set), to measure the “interestingness” of the current 
projection and then uses a numerical optimizer to move the projection 
direction to a more interesting position. Loosely speaking, a robust pro- 
jection index is one that prefers projections involving true clusters over 
those consisting of a cluster and an outlier. A good robust projection 
index should perform well even when specific assumptions required for 
“normal operation” fail to hold or hold only approximately. In a paper 
that was awarded the Royal Statistical Society Bronze Medal, Nason 
(2001) developed five new indices based on measuring divergence from 
the multivariate ¢ distribution with the joint pdf 


a T ((v + p)/2) xP x \ C+) 
f(x) = zrl — PT (v/2) (1+ =) 


241 


242 Applications 


that are intended to be especially robust. The first three indices are 
all weighted versions of the L?-divergences from f for v > 3. They are 
given by 


[L2 - J (lx) - fœ} f*(x)dx 


for a = 0,1/2,1. Nason (2000) derived an explicit formula for the case 
a = 0. The fourth index is the Student’s ¢ index defined by 


I1 = - f peax. 


This index is minimized over all spherical densities by f(x). Specifically, 
it satisfies the inequality 


pil se a Ue pie) 
Yv = 7/2 (y — 2AT (v/2) 


for all spherical densities g with equality if and only if g = f almost ev- 
erywhere. The proof of this result uses the fact that the index can be rep- 
resented as the sum of two F-divergences (Vajda, 1989). Through both 
numerical calculation and explicit analytical formulas, Nason (2001) 
found the the Student’s ¢ indices are generally more robust and that 
indices based on L?-divergences are also the most robust in their class. 
A detailed analytical exploration of one of the indices (r122) showed 
that it acts robustly when outliers diverge from a main cluster but be- 
haves like a standard projection index when two clusters diverge, that 
is, its behavior automatically changes depending on the degree of outlier 
contamination. The degree of sensitivity to outliers can be reduced by 
increasing the degrees of freedom v of the J, TL2 index to make it behave 
increasingly like Hall’s index (Hall, 1989) as v > oo. 

Using the transformation z = tan(@), Nason further developed the 
orthogonal expansion index given by 


2 [7 2 : 
m = i p (200) = Z costo) dð, 


where ge is the pdf of the transformed projected data X. Using the 
Fourier series expansion of ge(@) on [-7r/2, 7/2], 


golð) = = + 5 {an cos(2n0) + an sin(2n8)}, 


n=l 


12.2 Portfolio Optimization 243 


where 


2 a /2 
an = z g(0) cos(2n0)dé 
/2 


y. 


and 


2 a/2 
b = = i g(9) sin(2n8)dd, 


T J—n/2 


the index nie can be expanded as 


2 2 2 
L2 amyl 3 1 1 
a = {3 (0-2) +(a-3) t(e-3 


12.2 Portfolio Optimization 


There are a number of places in finance where robust estimation has 
been used. For example, when a stock’s returns are regressed on the 
market returns, the slope coefficient, called beta, is a measure of the 
relative riskiness of the stock in comparison to the market. Quite often, 
this regression will be performed using robust procedures. However, 
there appear to be fewer applications of robust estimation in the area 
of portfolio optimization. In the problem of finding a risk-minimizing 
portfolio subject to linear constraints, the classical approach assumes 
normality without exceptions. Lauprete et al. (2002) addressed the 
problem when the return data are generated by a multivariate distri- 
bution that is elliptically symmetric but not necessarily normal. They 
showed that when the returns have marginal heavy tails and multivariate 
tail-dependence, portfolios will also have heavy tails, and the classical 
procedures will be susceptible to outliers. They showed theoretically, 
and on simulated data, that robust alternatives have lower risks. In par- 
ticular, they showed that when returns have a multivariate ¢ distribution 
with degrees of freedom less than 6, the least absolute deviation (LAD) 
estimator has an asymptotically lower risk than the one based on the 
classical approach. The proposed methodology is applicable when heavy 
tails and tail-dependence in financial markets are documented especially 
at high sampling frequencies. 


244 Applications 


12.3 Discriminant and Cluster Analysis 


In the past, there have been many attempts to modify existing methods 
of discriminant and cluster analyses to provide robust procedures. Some 
of these have been of a rather ad hoc nature. Recently the multivari- 
ate t distribution has been employed for robust estimation. Suppose, 
for simplicity, that one utilizes two samples in order to assign a new 
observation into one of two groups, and consider the joint distribution 


F(x], x3) 
vv — 20 (v + np/2) 
qrnp/2 R”? 
Bo es —(v+np)/2 
x 1(¥-2) + OS xis -u R? (iy m;)| (12.1) 


i=1 j=1 


of the two samples X7 = (Xi,..-,Xin,) and X3 = (X21, ..-, Xen.) of 
sizes nı and ne, respectively. In (12.1), n = nı + ne. The (nı + n2)p- 
dimensional ¢ distribution (12.1) was proposed by Sutradhar (1990). It 
is evident that the marginals are distributed according to 


a Vy — 20 (v + p/2) 
f (Xij) xP! IRJ”? 
x [le = 2) + (xy — wi) R Gy a) O 2a 


which is a slight reparameterization of the usual multivariate ¢ pdf. Let 
mı and m2 denote the two t-populations of the form (12.2) with param- 
eters (4,,R,v) and (u3, R, v), respectively. Fisher’s optimal discrimi- 
nation criterion is robust against departure from normality (Sutradhar, 
1990), and it assigns the new observation with measurement X to 7 if 


; al 1 Z 
d(x) = (m = fg) ROM — 5 (p — og)” RO (p, + Ma) > O; 


otherwise, it assigns the observation to 72. But even though the clas- 
sification is based on the robust criterion, the probability of misclassi- 
fication depends on the degrees of freedom of the ¢ distribution. If eı 
and ez are probabilities of misclassification of an individual observation 
from mı into 72 and from 7 into 71, respectively, then 


N eee D m-a 


12.4 Multiple Decision Problems 245 


for i = 1,2, where A? = (m — py)’ RT! (p — H2). Calculations of e1 
and ez for selected values of A and v (Sutradhar, 1990) suggest that if a 
sample actually comes from a ¢-population (12.2) with degrees of freedom 
v, then the evaluation of the classification error rates by normal-based 
probabilities would unnecessarily make an experimenter more suspicious. 
Sutradhar (1990) illustrated the use of the preceding discrimination ap- 
proach by fitting the ¢ distribution to some bivariate data on two species 
of flea beetles. 

McLachlan and Peel (1998), McLachlan et al. (1999), and Peel and 
McLachlan (2000) used a mixture model of t distributions for a robust 
method of mixture estimation of clustering. They illustrated its useful- 
ness by a cluster analysis of a simulated data set with added background 
noise and of an actual data set. For other recent methods for making 
cluster algorithms robust, see Smith et al. (1993), Davé and Krishna- 
puram (1995), Jolion et al. (1995), Frigui and Krishnapuram (1996), 
Kharin (1996), Rousseeuw et al. (1996), and Zhuang et al. (1996). 


12.4 Multiple Decision Problems 


The multivariate ¢ distribution arises quite naturally in multiple decision 
problems. In fact, it is one of the earliest applications of this distribu- 
tion in statistical inference. Suppose there are q dependent. variates with 


means 6), ..., Ôn, ..., Oq, respectively, and that one has estimators Ê, of 
Ôn, h = 1,...,q available, which are jointly distributed according to a 
q-variate normal distribution with mean p, h = 1,...,q, and covariance 


matrix o?R, where R is a q x q positive definite matrix and g? is an 
unknown scale parameter. Let s? be an unbiased estimator of a? such 
that s? is independent of the 6;,’s and vs?/o? has the chi-squared distri- 
bution with degrees of freedom v. Consider p < q linearly independent 
linear combinations of 8ps, 


q 
X l PER ie 
Mi = CinOr =C; 0, 
h=1 


for i = 1,...,p, where c; = (Ci,...,Cin,---;Cig)? iS a q x 1 vector of 
known constants. The unbiased estimators of the m,’s are 


q 
mi = X cnn = cf 6, 


246 Applications 


each of which is a normally distributed random variable with mean m; 
and variance cT Rc;. Then 


A 


de el) i=l,...,p 

sVeP Re; _* 

is a Student’s t-variate and Y),..., Yp have the usual p-variate t distribu- 
tion with degrees of freedom v, zero means, and the correlation matrix 
{diu} given by 


Y, = 


T 
c; Rey 
7 ; 
Vc} Reic Rey 


For multiple comparisons, one computes the one- and two-sided confi- 
dence interval estimates of m; (i = 1,...,p) simultaneously with a joint 
confidence coefficient 1 — a, say. These estimates are given by (Dunnett, 


1955) 
Pii + hysy/c} Re; 
Tj £ hzs4/ cT Rci, 


respectively, where the constants hı and hz are determined so that the 
intervals in each case have a joint coverage probability of 1 — œa. The 
constants hı and hko can be computed using the methods discussed in 
Chapter 8. 


Oiu = 


and 


12.5 Other Applications 


Bayesian prediction approaches using the multivariate t distribution 
have attracted wide-ranging applications in the last several decades, 
and many sources are available in periodic and monographic literature. 
Chien (2002) discusses applications in speech recognition and online en- 
vironmental learning. In experiments of hands-free car speech recogni- 
tion of connected Chinese digits, it was shown that the proposed ap- 
proach is significantly better than conventional approaches. Blattberg 
and Gonedes (1974) were one of the first to discuss applications to se- 
curity returns data. For other applications, we refer the reader to the 
numerous modern books on multivariate analysis and to the Proceedings 
of the Valencia International Meetings. 


References 


Abdel-Hameed, H. and Sampson, A. R. (1978). Positive dependence of the 
bivariate and trivariate absolute normal, t, x”, and F' distributions, 
Annals of Statistics 6, 1360-1368. 

Abramowitz, M. and Stegun, I. A. (1965). Handbook of Mathematical 
Functions (Dover, New York). 

Abusev, R. A. and Kolegova, N. V. (2001). On estimation of probabilities of 
linear inequalities for multivariate ¢ distributions, Journal of 
Mathematical Sciences 103, 542-546. 

Aczel, J. (1966). Lectures on Functional Equations and Their Applications 
(Academic Press, New York). 

Afonja, B. (1972). The moments of the maximum of correlated normal and t 
variates, Journal of the Royal Statistical Society B 34, 251-262. 

Ahmed, A. and Gokhale, D. (1989). Entropy expressions and their 
estimators for multivariate distributions, IEEE Transactions on 
Information Theory 35, 688-692. 

Ahner, C. and Passing, H. (1983). Berechnung der multivaiaten ¢-verteilung 
und simultane vergleiche gegen eine kontrolle bei ungleichen 
gruppenbesetzungen, EDV in Medizin und Biologie 14, 113-120. 

Amos, D. E. (1978). Evaluation of some cumulative distribution functions by 
numerical evaluation, SIAM Review 20, 778-800. 

Amos, D. E. and Bulgren, W. G. (1969). On the computation of a bivariate t 
distribution, Mathematics and Computation 23, 319-333. 

Anderson, D. N. and Arnold, B. C. (1991). Centered distributions with 
Cauchy conditionals, Communications in Statistics—Theory and 
Methods 20, 2881-2889. 

Anderson, T. W. (1984). An Introduction to Multivariate Analysis, second 
edition (John Wiley and Sons, New York). 

Anderson, T. W. and Fang, K. T. (1987). Cochran’s theorem for elliptically 
contoured distribution, Sankhya A 49, 305-315. 

Ando, A. and Kaufman, G. W. (1965). Bayesian analysis of the independent 
multi-normal process — neither mean nor precision known, Journal of 
the American Statistical Association 60, 347-358. 

Arellano-Valle, R. and Bolfarine, H. (1995). On some characterization of the 
t-distribution, Statistics and Probability Letters 25, 79-85. 

Arellano-Valle, R., Bolfarine, H. and Iglesias, P. L. (1994). A predictivistic 
interpretation of the multivariate ¢ distribution, Test 3, 221-236. 


247 


248 References 


Armitage, J. V. and Krishnaiah, R. R. (1965). Tables of percentage points of 
multivariate t distribution (abstract), Annals of Mathematical Statistics 
36, 726. 

Arnold, B. C. and Beaver, R. J. (2000). Hidden truncation models, Sankhya 
A 62, 23-35. 

Arnold, B. C. and Press, S. J. (1989). Compatible conditional distributions, 
Journal of the American Statistical Association 84, 152-156. 

Aroian, L., Taneja, V. and Cornwall, L. (1978). Mathematical forms of the 
distribution of the product of two two normal normal variates, 
Communications in Statistics—Theory and Methods 7, 165-172. 

Arsian, O., Constable, P. D. L. and Kent, J. T. (1995). Convergence 
behavior of the EM algorithm for the multivariate ¢ distribution, 
Communications in Statistics— Theory and Methods 24, 2981-3000. 

Azzalini, A. and Capitanio, A. (1999). Statistical applications of the 
multivariate skew normal distribution, Journal of the Royal Statistical 
Society B 61, 579-602. 

Azzalini, A. and Capitanio, A. (2002). Distributions generated by 
perturbation of symmetry with emphasis on a multivariate skew t 
distribution. Submitted to Journal of the Royal Statistical Society B. 

Azzalini, A. and Dalla Valle, A. (1996). The multivariate skew normal 
distribution, Biometrika 83, 715-726. 

Barlow, R. E., Bartholomew, D. J., Bremner, J. M. and Brunk, H. D. (1972). 
Statistical Inference under Order Restrictions (John Wiley and Sons, 
Chichester). 

Bechhofer, R. E. and Dunnett, C. W. (1988). Tables of percentage points of 
multivariate t distributions, in Selected Tables in Mathematical Statistics 
11, ed. R. E. Odeh and J. M. Davenport (American Mathematical 
Society, Providence, Rhode Island). 

Bechhofer, R. E., Dunnett, C. W. and Sobel, M. (1954). A two-sample 
multiple-decision procedure for ranking means of normal populations 
with a common unknown variance, Biometrika 41, 170-176. 

Bennett, B. M. (1961). On a certain multivariate normal distribution, 
Proceedings of the Cambridge Philosophical Society 57, 434-436. 

Bickel, P. J. and Lehmann, E. L. (1975). Descriptive statistics for 
nonparametric models I. Introduction, Annals of Statistics 3, 1038-1044. 

Birnbaum, Z. (1948). On random variables with comparable peakedness, 
Annals of Mathematical Statistics 19, 76-81. 

Blattberg, R. C. and Gonedes, N. J. (1974). A comparison of the stable and 
Student distributions as statistical models for stock prices, Journal of 
Business 47, 224-280. 

Bohrer, R. (1973). A multivariate t probability integral, Biometrika 60, 
647-654. 

Bohrer, R. and Francis, G. K. (1972). Sharp one-sided confidence bounds 
over positive regions, Annals of Mathematical Statistics 43, 1541-1548. 

Bohrer, R., Schervish, M. and Sheft, J. (1982). Algorithm AS 184: 
Non-central studentized maximum and related multiple-t probabilities, 
Applied Statistics 31, 309-317. 

Bowden, D. C. and Graybill, F. A. (1966). Confidence bands of uniform and 
proportional width for linear models, Journal of the American Statistical 
Association 61, 182-198. 

Branco, M. D. and Dey, D. K. (2001). A general class of multivariate 


References 249 


skew-elliptical distributions, Journal of Multivariate Analysis 79, 
99-113. 

Bretz, F., Genz, A. and Hothorn, L. A. (2001). On the numerical availability 
of multiple comparison procedures, Biometrical Journal 43, 645-656. 

Bulgren, W. G. and Amos, D. E. (1968). A note on representation of the 
doubly non-central ¢ distribution, Journal of the American Statistical 
Association 63, 1013-1019. 

Bulgren, W. G., Dykstra, R. L. and Hewett, J. E. (1974). A bivariate t 
distribution with applications, Journal of the American Statistical 
Association 69, 525-532. 

Cadwell, J. H. (1951). The bivariate normal integral, Biometrika 38, 
475-481. 

Cain, M. (1996). Forecasting with the maximum of correlated components 
having bivariate t-distributed errors, IMA Journal of Mathematics 
Applied in Business and Industry 7, 233-237. 

Capitanio, A., Azzalini, A. and Stanghellini, E. (2002). Graphical models for 
skew-normal variates, Scandinavian Journal of Statistics, to appear. 

Carlson, B. C. (1977). Special Functions and Applied Mathematics 
(Academic Press, New York). 

Castillo, E. and Sarabia, J. M. (1990). Bivariate distributions with second 
kind Beta conditionals, Communications in Statistics—Theory and 
Methods 19, 3433-3445. 

Chapman, D. G. (1950). Some two-sample tests, Annals of Mathematical 
Statistics 21, 601-606. 

Chen, H. J. (1979). Percentage points of multivariate ¢ distribution with zero 
correlations and their application, Biometrical Journal 21, 347-360. 
Chib, S., Osiewalski, J. and Steel, M. F. J. (1991). Posterior inference on the 
degrees of freedom parameter in multivariate-t regression models, 

Economics Letters 37, 391-397. 

Chib, S., Tiwari, R. C. and Jammalamadaka, S. R. (1988). Bayes prediction 
in regressions with elliptical errors, Journal of Econometrics 38, 
349-360. 

Chien, J.-T. (2002). A Bayesian prediction approach to robust speech 
recognition and online environmental testing, Speech Communication 
37, 321-334. 

Chow, Y. S. and Teicher, H. (1978). Probability Theory (Springer-Verlag, 
Berlin). 

Constantine, A. G. (1963). Some noncentral distribution problems in 
multivariate analysis, Annals of Mathematical Statistics 34, 1270-1285. 

Corliss, G. F. and Rall, L. B. (1987). Adaptive, self-validating numerical 
quadrature, SIAM Journal on Scientific and Statistical Computing 8, 
831-847. 

Cornish, E. A. (1954). The multivariate ¢ distribution associated with a set 
of normal sample deviates, Australian Journal of Physics 7, 531-542. 

Cornish, E. A. (1955). The sampling distribution of statistics derived from 
the multivariate ¢ distribution, Australian Journal of Physics 8, 193-199. 

Cornish, E. A. (1962). The multivariate t distribution associated with the 
general multivariate normal distribution, CSIRO Technical Paper No. 
13, CSIRO Division in Mathematics and Statistics, Adelaide. 

Cornish, E. A. and Fisher, R. A. (1950). Moments and cumulants in the 
specification of distributions, in Contributions to Mathematical Statistics 


250 References 


(John Wiley and Sons, New York). 

Craig, C. (1936). On the frequency function of XY, Annals of Mathematical 
Statistics 7, 1-15. 

Cramér, H. (1951). Mathematical Methods of Statistics (Princeton University 
Press). 

DasGupta, A., Ghosh, J. K. and Zen, M. M. (1995). A new general method 
for constructing confidence sets in arbitrary dimensions: with 
applications, Annals of Statistics 23, 1408-1432. 

Davé, R. N. and Krishnapuram, R. (1995). Robust clustering methods: A 
unified view, IEEE Transactions on Fuzzy Systems 5, 270-293. 

David, H. A. (1982). Concomitants of order statistics: theory and 
applications, in Some Recent Advances in Statistics, ed. Tiago de 
Oliveira, pp. 89-100 (Academic Press, New York). 

Dawid, A. P. (1981). Some matrix-variate distribution theory: Notational 
considerations and a Bayesian application, Biometrika 68, 265-274. 

Deak, I. (1990). Random Number Generators and Simulation (Akademiai 
Kiado, Budapest). 

Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood 
from incomplete data via the EM algorithm (with discussion), Journal 
of the Royal Statistical Society B 39, 1-38. 

Dempster, A. P., Laird, N. M. and Rubin, D. B. (1980). Iteratively weighted 
least squares for linear regression where errors are normal independent 
distributed, in Multivariate Analysis 5, ed. P. R. Krishnaiah, pp. 35-37 
(North-Holland, New York). 

Dey, D. K. (1988). Simultaneous estimation of eigenvalues, Annals of the 
Institute of Statistical Mathematics 40, 137-147. 

Diaconis, P. and Ylvisaker, D. (1979). Conjugate priors for exponential 
families, Annals of Statistics 7, 269-281. 

Diaconis, P. and Ylvisaker, D. (1985). Quantifying prior opinion (with 
discussion), in Bayesian Statistics 2, ed. J. M. Bernardo, M. H. 
DeGroot, D. V. Lindley and A. F. M. Smith, pp. 133-156 (North 
Holland, Amsterdam). 

Dickey, J. M. (1965). Integrals of products of multivariate-t densities 
(abstract), Annals of Mathematical Statistics 36, 1611. 

Dickey, J. M. (1966a). Matric-variate generalizations of the multivariate £ 
distribution and the inverted multivariate ¢ distribution (abstract), 
Annals of Mathematical Statistics 37, 1423. 

Dickey, J. M. (1966b). On a multivariate generalization of the 
Behrens-Fisher distributions, Annals of Mathematical Statistics 37, 763. 

Dickey, J. M. (1967a). Expansions of t densities and related complete 
integrals, Annals of Mathematical Statistics 38, 503-510. 

Dickey, J. M. (1967b). Matric-variate generalizations of the multivariate t 
distribution and the inverted multivariate ¢ distribution, Annals of 
Mathematical Statistics 38, 511-518. 

Dickey, J. M. (1968). Three multidimensional-integral identities with 
Bayesian applications, Annals of Mathematical Statistics 39, 1615-1627. 

Dickey, J. M., Dawid, A. and Kadane, J. B. (1986). Subjective-probability 
assessment methods for multivariate-t and matrix-t models, in Bayesian 
Inference and Decision Techniques, pp. 177-195 (North-Holland, 
Amsterdam). 

Dreier, I. and Kotz, S. (2002). A note on the characteristic function of the 


References 251 


t-distribution, Statistics and Probability Letters 57, 221-224. 

Dunn, O. J. (1958). Estimation of the means of dependent variables, Annals 
of Mathematical Statistics 29, 1095-1111. 

Dunn, O. J. (1961). Multiple comparison among means, Journal of the 
American Statistical Association 56, 52-64. 

Dunn, O. J. (1965). A property of the multivariate ¢ distribution, Annals of 
Mathematical Statistics 36, 712-714. 

Dunn, O. J. and Massey, F. J. (1965). Estimation of multiple contrasts using 
t-distributions, Journal of the American Statistical Association 60, 
573-583. 

Dunnett, C. W. (1955). A multiple comparison procedure for comparing 
several treatments with a control, Journal of the American Statistical 
Association 50, 1096-1121. 

Dunnett, C. W. (1964). New tables for multiple comparisons with a control, 
Biometrics 20, 482-491. 

Dunnett, C. W. (1985). Multiple comparisons between several treatments 
and a specified treatment, in Linear Statistical Inference, Lecture Notes 
in Statistics No. 35, ed. T. Caliriski and W. Klonecki, pp. 39-46 
(Springer-Verlag, New York). 

Dunnett, C. W. (1989). Algorithm AS 251: Multivariate normal probability 
integrals with product correlation structure, Applied Statistics 38, 
564-579. 

Dunnett, C. W. and Sobel, M. (1954). A bivariate generalization of Student’s 
t-distribution with tables for certain special cases, Biometrika 41, 
153-169. 

Dunnett, C. W. and Sobel, M. (1955). Approximations to the probability 
integral and certain percentage points of a multivariate analogue of 
Student’s ¢-distribution, Biometrika 42, 258-260. 

Dunnett, C. W. and Tamhane, A. C. (1990). A step-up multiple test 
procedure, Technical Report 90-1, Department of Statistics, 
Northwestern University. 

Dunnett, C. W. and Tamhane, A. C. (1991). Step-down multiple tests for 
comparing treatments with a control in unbalanced one-way layouts, 
Statistics in Medicine 10, 939-947. 

Dunnett, C. W. and Tamhane, A. C. (1992). A step-up multiple test 
procedure, Journal of the American Statistical Association 87, 162-170. 

Dunnett, C. W. and Tamhane, A. C. (1995). Step-up multiple testing of 
parameters with unequally correlated estimates, Biometrics 51, 217-227. 

Dutt, J. E. (1973). A representation of multivariate normal probability 
integrals by integral transforms, Biometrika 60, 637-645. 

Dutt, J. E. (1975). On computing the probability integral of a general 
multivariate t, Biometrika 62, 201-205. 

Dutt, J. E., Mattes, K. D., Soms, A. P. and Tao, L. C. (1976). An 
approximation to the maximum modulus of the trivariate T with a 
comparison to exact values, Biometrics 32, 465-469. 

Dutt, J. E., Mattes, K. D. and Tao, L. C. (1975). Tables of the trivariate t 
for comparing three treatments to a control with unequal sample sizes, 
G. D. Searle and Company, Math. and Statist. Services, TR-3. 

Eaton, M. L. and Efron, B. (1970). Hotelling’s T? test under symmetry 
conditions, Journal of the American Statistical Association 65, 702-711. 

Edwards, D. E. and Berry, J. J. (1987). The efficiency of simulation based 


252 References 


multiple comparisons, Biometrics 43, 913-928. 

Erdélyi, A., Magnus, W., Oberhettinger, F. and Tricomi, F. G. (1953). 
Higher Transcendental Functions, volumes 1 and 2 (McGraw-Hill, New 
York). 

Esary, J. D., Proschan, F. and Walkup, D. W. (1967). Association of random 
variables with applications, Annals of Mathematical Statistics 38, 
1466-1474. 

Fang, H.-B., Fang, K.-T. and Kotz, S. (2002). The meta-elliptical 
distributions with given marginals, Journal of Multivariate Analysis 82, 
1-16. 

Fang, K.-T. and Anderson, T. W. (1990). Statistical Inference in Elliptically 
Contoured and Related Distributions (Alberton Press, New York). 

Fang, K.-T., Kotz, S. and Ng, K. W. (1990). Symmetric Multivariate and 
Related Distributions (Chapman and Hall, London), 

Fernandez, C. and Steel, M. F. J. (1999). Multivariate Student-t regression 
models: pitfalls and inference, Biometrika 86, 153-167. 

Fisher, R. A. (1925). Expansion of Student’s integral in power of n7!, 
Metron 5, 109. 

Fisher, R. A. (1935). The fiducial argument in statistical inference, Annals of 
Eugenics 6, 391-398. 

Fisher, R. A. (1941). The asymptotic approach to Behren’s integral with 
further tables for the d-test of significance, Ann. Eugen., Lond. 11, 141. 

Fisher, R. A. and Healy, M. J. R. (1956). New tables of Behrens’ test of 
significance, Journal of the Royal Statistical Society B 18, 212-216. 

Fisher, R. A. and Yates, F. (1943). Statistical Tables for Biological, 
Agricultural and Medical Research, second edition (Oliver and Boyd, 
London). 

Fleishman, A. I. (1978). A method for simulating nonnormal distributions, 
Psychometrika 43, 521-532. 

Fraser, D. A. S. and Haq, M. S. (1969). Structural probability and prediction 
for the multivariate model, Journal of the Royal Statistical Society 31, 
317-331. 

Freeman, H. and Kuzmack, A. (1972). Tables of multivariate t in six or more 
dimensions, Biometrika 59, 217-219. 

Freeman, H., Kuzmack, A. and Maurice, R. (1967). Multivariate t and the 
ranking problem, Biometrika 54, 305-308. 

Frigui, H. and Krishnapuram, R. (1996). A robust algorithm for automatic 
extraction of an unknown number of clusters from noisy data, Pattern 
Recognition Letters 17, 1223-1232. 

Fry, R. L. (ed). (2002). Bayesian Inference and Maximum Entropy Methods 
in Science and Engineering: Proceedings of the 21st International 
Workshop on Bayesian Inference and Mazimum Entropy Methods in 
Science and Engineering (American Institute of Physics, New York). 

Fujikoshi, Y. (1987). Error bounds for asymptotic expansions of scale 
mixtures of distributions, Hiroshima Mathematics Journal 17, 309-324. 

Fujikoshi, Y. (1988). Non-uniform error bounds for asymptotic expansions of 
scale mixtures of distributions, Journal of Multivariate Analysis 27, 
194-205. 

Fujikoshi, Y. (1989). Error bounds for asymptotic expansions of the 
maximums of the multivariate t- and F-variables with common 
denominator, Hiroshima Mathematics Journal 19, 319-327. 


References 253 


Fujikoshi, Y. (1993). Error bounds for asymptotic approximations of some 
distribution functions, in Multivariate Analysis: Future Directions, ed. 
C. R. Rao, pp. 181-208 (North-Holland, Amsterdam). 

Fujikoshi, Y. (1997). An asymptotic expansion for the distribution of 
Hotelling’s T?-statistic under nonnormality, Journal of Multivariate 
Analysis 61, 187-193. 

Fujikoshi, Y. and Shimizu, R. (1989). Error bounds for asymptotic 
expansions of scale mixtures of univariate and multivariate 
distributions, Journal of Multivariate Analysis 30, 279-291. 

Fujikoshi, Y. and Shimizu, R. (1990). Asymptotic expansions of some 
distributions and their error bounds-the distributions of sums of 
independent random variables and scale mixtures, Sugaku Expositions 3, 
75-96. 

Geisser, S. (1965). Bayesian estimation in multivariate analysis, Annals of 
Mathematical Statistics 36, 150-159. 

Geisser, S. and Cornfield, J. (1963). Posterior distributions for multivariate 
normal parameters, Journal of the Royal Statistical Society B 25, 
368-376. 

Genz, A. (1992). Numerical computation of the multivariate normal 
probabilities, Journal of Computational and Graphical Statistics 1, 
141-150. 

Genz, A. and Bretz, F. (1999). Numerical computation of multivariate t 
probabilities with application to power calculation of multiple contrasts, 
Journal of Statistical Computation and Simulation 63, 361-378. 

Genz, A. and Bretz, F. (2001). Methods for the computation of multivariate 
t-probabilities, Journal of Computational and Graphical Statistics. 

Ghosh, B. K. (1973). Some monotonicity theorems for x”, F and t 
distributions with applications, Journal of the Royal Statistical Society 
B 35, 480-492. 

Ghosh, B. K. (1975). On the distribution of the difference of two t variables, 
Journal of the American Statistical Association 70, 463-467. 

Gill, M. L., Tiku, M. L. and Vaughan, D. C. (1990). Inference problems in 
life testing under multivariate normality, Journal of Applied Statistics 
17, 133-147. 

Glaz, J. and Johnson, B. McK. (1984). Probability inequalities for 
multivariate distributions with dependence structures, Journal of the 
American Statistical Association 79, 435-440. 

Goldberg, H. and Levine, H. (1946). Approximate formulas for the 
percentage points and normalization of t and x’, Annals of 
Mathematical Statistics 17, 216. 

Goodman, M. R. (1963). Statistical analysis based on a certain multivariate 
complex Gaussian distribution (an introduction), Annals of 
Mathematical Statistics 34. 

Graybill, F. A. and Bowden, D. C. (1967). Linear segment confidence bands 
for simple linear models, Journal of the American Statistical Association 
62, 403-408. 

Grosswald, E. (1976). The Student t-distribution of any degree of freedom is 
infinitely divisible, Zeitschrift für Wahrscheinlichkeitstheorie und 
Verwandte Gebiete 36, 103-109. 

Guerrero-Cusumano, J.-L. (1996a). A measure of total variability for the 
multivariate t distribution with applications to finance, Information 


254 References 


Sciences 92, 47-63. 

Guerrero-Cusumano, J.-L. (1996b). An asymptotic test of independence for 
multivariate ¢ and Cauchy random variables with applications, 
Information Sciences 92, 33-45. 

Guerrero-Cusumano, J.-L. (1998). Measures of dependence for the 
multivariate ¢ distribution with applications to the stock market, 
Communications in Statistics—Theory and Methods 27, 2985-3006. 

Gupta, A. K. (2000). Multivariate skew ¢ distribution, Technical Report No. 
00-04, Department of Mathematics and Statistics, Bowling Green State 
University, Bowling Green, Ohio. 

Gupta, A. K. and Kollo. T. (2000). Multivariate skew normal distribution: 
some properties and density expansions, Technical Report No. 00-03, 
Department of Mathematics and Statistics, Bowling Green State 
University, Bowling Green, Ohio. 

Gupta, R. P. (1964). Some extensions of the Wishart and multivariate t 
distributions in the complex case, Journal of the Indian Statistical 
Association 2, 131-136. 

Gupta, S. S. (1963). Probability integrals of multivariate normal and 
multivariate t, Annals of Mathematical Statistics 34, 792-828. 

Gupta, S. S., Nagel, K. and Panchapakesan, S. (1973). On the order 
statistics from equally correlated normal random variables, Biometrika 
60, 403-413. 

Gupta, S. S., Panchapakesan, S. and Sohn, J. K. (1985). On the distribution 
of the studentized maximum of equally correlated normal random 
variables, Communications in Statistics—Simulation and Computation 
14, 103-135. 

Gupta, S. S. and Sobel, M. (1957). On a statistic which arises in selection 
and ranking problems, Annals of Mathematical Statistics 28, 957-967. 

Hahn, G. J. and Hendrickson, R. W. (1971). A table of percentage points of 
the distribution of the largest absolute value of k Student ¢ variates and 

its application, Biometrika 58, 323-332. 

Hahn, M. G. and Klass, M. J. (1980a). Matrix normalization of sums of 
random vectors in the domain of attraction of the multivariate normal, 
Annals of Probability 8, 262-280. 

Hahn, M. G. and Klass, M. J. (1980b). The generalized domain of attraction 
of spherically symmetric stable laws in RÝ, in Proceedings of the 
Conference on Probability in Vector Spaces II, Lecture Notes in 
Mathematics 828, pp. 52-81 (Springer-Verlag, New York). 

Halgreen, C. (1979). Self-decomposability of the generalized inverse Gaussian 
and hyperbolic distributions, Zeitschrift für Wahrscheinlichkeitstheorie 
und Verwandte Gebiete 47, 13-17. 

Hall, P. (1989). On polynomial-based projection indices for exploratory 
projection pursuit, Annals of Statistics 17, 589-605. 

Halperin, M. (1967). An inequality on a bivariate ¢ distribution, Journal of 
the American Statistical Association 62, 603-606. 

Halperin, M., Greenhouse, S. W., Cornfield, J. and Zalokar, J. (1955). Tables 
of percentage points for the studentized maximum absolute deviate in 
normal samples, Journal of the American Statistical Association 50, 
185-195. 

Hammersley, J. M. and Handscomb, D. C. (1964). Monte Carlo Methods 
(Methuen & Co. Ltd, London). 


References 255 


Haq, M. S. and Khan, S. (1990). Prediction distribution for a linear 
regression model with multivariate Student-é error distribution, 
Communications in Statistics—Theory and Methods 19, 4705-4712. 

Harter, H. L. (1951). On the distribution of Wald’s classification statistic, 
Annals of Mathematical Statistics 22, 58-67. 

Hayakawa, T. (1989). On the distributions of the functions of the F-matrix 
under an elliptical population, Journal of Statistical Planning and 
Inference 21, 41-52. 

Hochberg, Y. and Tambane, A. C. (1987). Multiple Comparison Procedures 
(John Wiley and Sons, New York). 

Hsu, H. (1990). Noncentral distributions of quadratic forms for elliptically 
contoured distributions, in Statistical Inference in Elliptically Contoured 
and Related Distributions, pp. 97-102 (Allerton, New York). 

Hsu, J. C. (1992). The factor analytic approach to simultaneous inference in 
the general linear model, Journal of Computational and Graphical 
Statistics 1, 151-168. 

Hsu, J. C. and Nelson, B. L. (1998). Multiple comparisons in the general 
linear model, Journal of Computational and Graphical Statistics 7, 
23-41. 

Hutchinson, T. P. and Lai, C. D. (1990). Continuous Bivariate Distributions, 
Emphasising Applications (Rumsby, Adelaide). 

Ifram, A. F. (1970). On the characteristic function of F and ¢ distributions, 
Sankhya A 32, 350-352. 

International Mathematical and Statistical Libraries (1987). MATH/Library, 
Fortran Subroutines for Mathematical Applications (International 
Mathematical and Statistical Libraries, Houston). 

Iwashita, T. (1997). Asymptotic null and nonnull distribution of Hotelling’s 
T?-statistic under the elliptical distribution, Journal of Statistical 
Planning and Inference 61, 85-104. 

Iyengar, S. (1988). Evaluation of normal probabilities of symmetric regions, 
SIAM Journal on Scientific and Statistical Computing 9, 418-423. 

James, A. T. (1964). Distribution of matrix variates and latent roots derived 
from normal samples, Annals of Mathematical Statistics 35, 475-501. 

James, W. and Stein, C. (1961). Estimation with quadratic loss, in 
Proceedings of the Fourth Berkeley Symposium on Mathematical 
Statistics and Probability 1, pp. 361-379. 

Javier, W. R. and Gupta, A. K. (1985). On matric variate-¢ distribution, 
Communications in Statistics—Theory and Methods 14, 1413-1425. 

Javier, W. R. and Srivastava, T. N. (1988). On the multivariate t 
distribution, Pakistan Journal of Statistics 4, 101-109. 

Jaynes, E. T. (1957). Information theory and statistical mechanics, Physics 
Review 106, 620-630. 

Jensen, D. R. (1994). Closure of multivariate ¢ and related distributions, 
Statistics and Probability Letters 20, 307-312. 

Joarder, A. H. (1995). Estimation of the trace of the scale matrix of a 
multivariate t-model, in Proceedings of the Econometrics Conference, 
pp. 467-474 (Monash University, Australia). 

Joarder, A. H. (1998). Some useful Wishart expectations based on the 
multivariate t-model, Statistical Papers 39, 223-229. 

Joarder, A. H. and Ahmed, S. E. (1996). Estimation of the characteristic 
roots of the scale matrix, Metrika 44, 259-267. 


256 References 


Joarder, A. H. and Ahmed, S. E. (1998). Estimation of the scale matrix of a 
class of elliptical distributions, Metrika 48, 149-160. 

Joarder, A. H. and Ali, M. M. (1992). On some generalized Wishart 
expectations, Communications in Statistics—Theory and Methods 21, 
283-294. 

Joarder, A. H. and Ali, M. M. (1996). On the characteristic function of the 
multivariate t distribution, Pakistan Journal of Statistics 12, 55-62. 

Joarder, A. H. and Ali, M. M. (1997). Estimation of the scale matrix of a 
multivariate t-model under entropy loss, Metrika 46, 21-32. 

Joarder, A. H. and Singh, S. (1997). Estimation of the trace of the scale 
matrix of a multivariate t-model using regression type estimator, 
Statistics 29, 161-168. 

Joe, H. (1989). Relative entropy measures of multivariate dependence, 
Journal of the American Statistical Association 84, 157-164. 

Joe, S. (1990). Randomization of lattice rules for numerical multiple 
integration, Journal of Computational and Applied Mathematics 31, 
299-304. 

Jogdeo, K. (1977). Association of probability inequalities, Annals of 
Statistics 5, 495-504. 

John, S. (1961). On the evaluation of the probability integral of the 
multivariate ż distribution, Biometrika 48, 409-417. 

John, S. (1964). Methods for the evaluation of probabilities of polygonal and 
angular regions when the distribution is bivariate t, Sankhyā A 26, 
47-54. 

John, S. (1966). On the evaluation of probabilities of convex polyhedra 
under multivariate normal and t distributions, Journal of the Royal 
Statistical Society B 28, 366-369. 

Johnson, M. (1987). Multivariate Statistical Simulation (John Wiley and 
Sons, New York). 

Johnson, N. L. and Kotz, S. (1972). Distributions in Statistics: Continuous 
Multivariate Distributions (John Wiley and Sons, New York). 

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995). Continuous 
Univariate Distributions, volume 2, second edition (John Wiley and 
Sons, New York). 

Johnson, R. A. and Weerahandi, S. (1988). A Bayesian solution to the 
multivariate Behrens-Fisher problem, Journal of the American 
Statistical Association 83, 145-149. 

Jolion, J.-M., Meer, P. and Bataouche, S. (1995). Robust clustering with 
applications in computer vision, JEEE Transactions on Pattern Analysis 
and Machine Intelligence 13, 791-802. 

Jones, M. C. (2001a). A skew t distribution, in Probability and Statistical 
Models with Applications, ed. C. A. Charalambides, M. V. Koutras and 
N. Balakrishnan, pp. 269-278 (Chapman and Hall, London). 

Jones, M. C. (2001b). Multivariate t and beta distributions associated with 
multivariate F distribution, Metrika 54, 215-231. 

Jones, M. C. (2002a). A bivariate distribution with support above the 
diagonal and skew t marginals. Submitted. 

Jones, M. C. (2002b). A dependent bivariate ¢ distribution with marginal on 
different degrees of freedom, Statistics and Probability Letters 56, 
163-170. 

Jones, M. C. (2002c). Marginal replacement in multivariate densities, with 


References 257 


application to skewing spherically symmetric distributions, Journal of 
Multivariate Analysis 81, 85-99. 

Jones, M. C. and Faddy, M. J. (2002). A skew extension of the t distribution 
with applications. Submitted. 

Jones, M. C. and Sibson, R. (1987). What is projection pursuit (with 
discussion)? Journal of the Royal Statistical Society A 150, 1-36. 

Kabe, D. G. and Gupta, A. K. (1990). Hotelling’s T?-distribution for a 
mixture of two normal populations, South African Statistical Journal 
24, 87-92. 

Kano, Y. (1994). Consistency property of elliptical probability density 
functions, Journal of Multivariate Analysis 51, 139-147. 

Kano, Y. (1995). An asymptotic expansion of the distribution of Hotelling’s 
T?-statistic under general distributions, American Journal of 
Mathematical and Management Sciences 15, 317-341. 

Kappenman, R. F. (1971). A note on the multivariate ¢ ratio distribution, 
Annals of Mathematical Statistics 42, 349-351. 

Kass, R. E. and Steffey, D. (1989). Approximate Bayesian in conditionally 
independent hierarchical models, Journal of the American Statistical 
Association 84, 717-726. 

Kelejian, H. H. and Prucha, I. R. (1985). Independent or uncorrelated 
disturbances in linear regression: An illustration of the difference, 
Economic Letters 19, 35-38. 

Kelker, D. (1970). Distribution theory of spherical distributions and location 
scale parameters, Sankhya A 32, 419-430. 

Kelker, D. (1971). Infinite divisibility and variance mixtures of the normal 
distribution, Annals of Mathematical Statistics 42, 802-808. 

Kendall, M. G. and Stuart, A. (1958). The Advanced Theory of Statistics 
(Hafner, New York). 

Kent, J. T., Tyler, D. E. and Vardi, Y. (1994). A curious likelihood identity 
for the multivariate t-distribution, Communications in 
Statistics—Simulation and Computation 23, 441-453. 

Kharin, Y. (1996). Robustness in Statistical Pattern Recognition (Kluwer, 
Dordrecht). 

Khatri, C. G. (1967). On certain inequalities for normal distributions and 
their applications to simultaneous confidence bands, Annals of 
Mathematical Statistics 38, 1853-1867. 

Kiefer, J. and Schwarz, R. (1965). Admissible Bayes character of T?-, R?-, 
and other fully invariant tests for classical multivariate normal 
problems, Annals of Mathematical Statistics 36, 747-770. 

Kopal, Z. (1955). Numerical Analysis (Chapman and Hall, London). 
Kottas, A., Adamidis, K. and Loukas, S. (1999). Bivariate distributions with 
Pearson type VII conditionals, Annals of the Institute of Statistical 

Mathematics 51, 331-344. 

Kotz, S., Balakrishnan, N. and Johnson, N. L. (2000). Continuous 
Multivariate Distributions, Volume 1: Models and Applications, second 
edition (John Wiley and Sons, New York). 

Kotz, S., Lumelski, Y. and Pensky, M. (2003). Strength Stress Models with 
Applications (World Scientific Press, Singapore). 

Kozumi, H. (1994). Testing equality of the means in two independent 
multivariate ¢ distributions, Communications in Statistics—Theory and 
Methods 23, 215-227. 


258 References 


Krishnaiah, P. R. and Armitage, J. V. (1966). Tables for multivariate t 
distribution, Sankhya B 28, 31-56. 

Krishnan, M. (1959). Studies in statistical inference, Ph.D. Thesis, Madras 
University, India. 

Krishnan, M. (1967a). The moments of a doubly noncentral t distribution, 
Journal of the American Statistical Association 62, 278-287. 

Krishnan, M. (1967b). The noncentral bivariate chi distribution, SIAM 
Review 9, 708-714. 

Krishnan, M. (1968). Series representations of the doubly noncentral t 
distribution, Journal of the American Statistical Association 63, 
1004-1012. 

Krishnan, M. (1970). The bivariate doubly noncentral t distribution 
(abstract), Annals of Mathematical Statistics 41, 1135. 

Krishnan, M. (1972). Series representations of a bivariate singly noncentral £ 
distribution, Journal of the American Statistical Association 67, 
228-231. 

Kshirsagar, A. M. (1961). Some extensions of the multivariate generalization 
t distribution and the multivariate generalization of the distribution of 
the regression coefficient, in Proceedings of the Cambridge Philosophical 
Society 57, pp. 80-85. 

Kudé, A. (1963). A multivariate analogue of the one-sided test, Biometrika 
50, 403-418. 

Kullback, S. (1968). Information Theory and Statistics (John Wiley and 
Sons, New York). 

Kunte, S. and Rattihalli, R. N. (1984). Rectangular regions of maximum 
probability content, Annals of Statistics 12, 1106-1108. 

Kurths, J., Voss, A. and Saparin, P., Witt, A., Kleiner, H. J. and Wessel, N. 
(1995). Quantitative analysis of heart rate variability, Chaos 1, 88-94. 

Kwong, K.-S. (2001a). A modified Dunnett and Tamhane step-up approach 
for establishing superiority /equivalence of a new treatment compared 
with k standard treatments, Journal of Statistical Planning and 
Inference 97, 359-366. 

Kwong, K.-S. (2001b). An algorithm for construction of multiple hypothesis 
testing, Computational Statistics 16, 165-171. 

Kwong, K.-S. and Iglewicz, B. (1996). On singular multivariate normal 
distribution and its applications, Computational Statistics and Data 
Analysts 22, 271-285. 

Kwong, K.-S. and Liu, W. (2000). Calculation of critical values for Dunnett 
and Tamhane’s step-up multiple test procedure, Statistics and 
Probability Letters 49, 411-416. 

Landenna, G. and Ferrari, P. (1988). The k-variate student distribution and 
a test with the control of type I error in multiple decision problems, 
Technical Report, Istituto di Scienze Statistiche e Matematiche, 
Universita di Milano, Italy. 

Lange, K. and Sinsheimer, J. S. (1993). Normal/independent distributions 
and their applications in robust regression, Journal of Computational 
and Graphical Statistics 2, 175-198. 

Lange, K. L., Little, R. J. A. and Taylor, J. M. G. (1989). Robust statistical 
modeling using the ¢ distribution, Journal of the American Statistical 
Association 84, 881-896. 

Lauprete, G. J., Samarov, A. M. and Welsch, R. E. (2002). Robust portfolio 


References 259 


optimization, Metrika 55, 139-149. 

Lazo, A. and Rathie, P. (1978). On the entropy of continuous probability 
distributions, IEEE Transactions on Information Theory 24, 120-122. 

Lebedev, N. N. (1965). Special Functions and Their Applications 
(Prentice-Hall Inc., New Jersey). 

Lee, R. E. and Spurrier, J. D. (1995). Successive comparisons between 
ordered treatments, Journal of Statistical Planning and Inference 43, 
323-330. 

Lehmann, E. L. (1966). Some concepts of dependence, Annals of 
Mathematical Statistics 37, 1137-1153. 

Leonard, T. (1982). Comment on “A simple predictive density function” by 
M. Lejeune and G. D. Faukkenberry,” Journal of the American 
Statistical Association 77, 657-658. 

Leonard, T., Hsu, J. S. J. and Ritter, C. (1994). The Laplacian 
T-approximation in Bayesian inference, Statistica Sinica 4, 127-142. 

Leonard, T., Hsu, J. S. J. and Tsui, K. W. (1989). Bayesian marginal 
inference, Journal of the American Statistical Association 84, 
1051-1058. 

Lin, P. (1972). Some characterizations of the multivariate t distribution, 
Journal of Multivariate Analysis 2, 339-344. 

Linfoot, E. (1957). An informational measure of correlation, Information and 
Control 1, 85-89. 

Little, R. J. A. (1988). Robust estimation of the mean and covariance matrix 
from data with missing values, Applied Statistics 37, 23-39. 

Little, R. J. A. and Rubin, D. B. (1987). Statistical Analysts with Missing 
Data (John Wiley and Sons, New York). 

Liu, C. (1993). Bartlett’s decomposition of the posterior distribution of the 
covariance for normal monotone ignorable missing data, Journal of 
Multivariate Analysts 46, 198-206. 

Liu, C. (1995). Missing data imputation using the multivariate t 
distribution, Journal of Multivariate Analysis 53, 139-158. 

Liu, C. (1996). Bayesian robust multivariate linear regression with 
incomplete data, Journal of the American Statistical Association 91, 
1219-1227. 

Liu, C. (1997). ML estimation of the multivariate ¢ distribution and the EM 
algorithm, Journal of Multivariate Analysis 63, 296-312. 

Liu, C. and Rubin, D. B. (1995). ML estimation of the multivariate t 
distribution with unknown degrees of freedom, Statistica Sinica 5, 19-39. 

Liu, C., Rubin, D. B. and Wu, Y. N. (1998). Parameter expansion to 
accelerate EM: the PX-EM algorithm, Biometrika 85, 755-770. 

Liu, W., Miwa, T. and Hayter, A. J. (2000). Simultaneous confidence 
interval estimation for successive comparisons of ordered treatment 
effects, Journal of Statistical Planning and Inference 88, 75-86. 

Magnus, J. R. and Neudecker, H. (1979). The commutation matrix: Some 
properties and applications, Annals of Statistics 7, 381-394. 

Magnus, W., Oberhettinger, F. and Soni, R. P. (1966). Formulas and 
Theorems for the Special Functions of Mathematical Physics 
(Springer-Verlag, New York). 

Mann, N. R. (1982). Optimal outlier tests for a Weibull model - To identify 
process changes or to predict failure times, TIMS/Studies in the 
Management Sciences 19, 261-279. 


260 References 


Mardia, K. V. (1970a). Families of Bivariate Distributions (Griffin, London). 

Mardia, K. V. (1970b). Measures of multivariate skewness and kurtosis with 
applications, Biometrika 57, 519-530. 

Maronna, R. A. (1976). Robust M-estimators of multivariate location and 
scatter, Annals of Statistics 4, 51-67. 

Marsaglia, G. (1965). Ratios of normal variables and ratios of sums of 
uniform variables, Journal of the American Statistical Association 60, 
193-204. 

Marshall, A. W. and Olkin, I. (1974). Majorization in multivariate 
distributions, Annals of Statistics 2, 1189-1200. 

McLachlan, G. J. and Peel, D. (1998). Robust cluster analysis via mixtures 
of multivariate t-distributions, in Lecture Notes in Computer Science 
1451, ed. A. Amin, D. Dori, P. Pudil amd H. Freeman, pp. 658-666 
(Springer-Verlag, Berlin). 

McLachlan, G. J., Peel, D., Basford, K. E. and Adams, P. (1999). Fitting of 
mixtures of normal and ¢ components, Journal of Statistical Software 4. 

Meng, X. L. and van Dyk, D. (1997). The EM algorithm—An old folk song 
sung to fast new tune (with discussion), Journal of the Royal Statistical 
Society B 59, 511-567. 

McCann, M. and Edwards, D. (1996). A path inequality for the multivariate 
t distribution, with applications to multiple comparisons, Journal of the 
American Statistical Association 91, 211-216. 

Miller, K. S. (1968). Some multivariate ¢ distributions, Annals of 
Mathematical Statistics 39, 1605-1609. 

Milton, R. C. (1963). Tables of the equally correlated multivariate normal 
probability integral, Technical Report No. 27, University of Minnesota, 
Minneapolis. 

Morales, D., Pardo, L. and Vajda, I. (1997). Some new statistics for testing 
hypotheses in parametric models, Journal of Multivariate Analysis 62, 
137-168. 

Nagarsenker, B. N. (1975). Some distribution problems connected with 
multivariate t distribution, Metron 33, 66-74. 

Nason, G. P. (2000). Analytic formulae for projection indices in a robustness 
experiment, Technical Report 00:06, Department of Mathematics, 
University of Bristol. 

Nason, G. P. (2001). Robust projection indices, Journal of the Royal 
Statistical Society B 63, 551-567. 

Neyman, J. (1959). Optimal asymptotic tests for composite hypotheses, in 
Probability and Statistics, ed. U. Grenander, pp. 213-234 (John Wiley 
and Sons, New York). 

Nicholson, C. (1943). The probability integral for two variables, Biometrika 
33, 59-72. 

Osiewalski, J. (1991). A note on Bayesian inference in a regression model 
with elliptical errors, Journal of Econometrics 48, 183-193. 

Osiewalski, J. and Steel, M. F. J. (1990). Robust Bayesian inference in 
elliptical regression models, Center Discussion Paper 9032, Tilburg 
University. 

Owen, D. B. (1956). Tables for computing bivariate normal probabilities, 
Annals of Mathematical Statistics 27, 1075-1090. 

Owen, D. B. (1965). A special case of a bivariate non-central ¢ distribution, 
Biometrika 52, 437-446. 


References 261 


Patil, S. A. and Kovner, J. L. (1968). On the probability of trivariate 
Student’s ¢ distribution (abstract), Annals of Mathematical Statistics 
39, 1784. 

Patil, S. A. and Kovner, J. L. (1969). On the bivariate doubly noncentral t 
distributions (abstract), Annals of Mathematical Statistics 40, 1868. 
Patil, S. A. and Liao, S. H. (1970). The distribution of the ratios of means to 
the square root of the sum of variances of a bivariate normal sample, 

Annals of Mathematical Statistics 41, 723-728. 

Patil, V. H. (1965). Approximation to the Behrens-Fisher distributions, 
Biometrika 52, 267-271. 

Patnaik, P. B. (1955). Hypotheses concerning the means of observations in 
normal samples, Sankhya 15, 343-372. 

Paulson, E. (1952). On the comparison of several experimental categories 
with a control, Annals of Mathematical Statistics 23, 239-246. 

Pearson, K. (1923). On non-skew frequency surfaces, Biometrika 15, 231. 

Pearson, K. (1931). Tables for Statisticians and Biometricians, Part II 
(Cambridge University Press for the Biometrika Trust, London). 

Peel, D. and McLachlan, G. J. (2000). Robust mixture modelling using the t 
distribution, Statistics and Computing 10, 339-348. 

Pestana, D. (1977). Note on a paper of Ifram, Sankhya A 39, 396-397. 

Pillai, K. C. S. and Ramachandran, K. V. (1954). Distribution of a 
Studentized order statistic, Annals of Mathematical Statistics 25, 
565-571. 

Press, S. J. (1969). The t ratio distribution, Journal of the American 
Statistical Association 64, 242-252. 

Press, S. J. (1972). Applied Multivariate Analysis (Holt, Rinehart and 
Winston, Inc, New York). 

Press, W. H. (1986). Numerical Recipes: The Art of Scientific Computing 
(Cambridge University Press, Cambridge). 

Raiffa, H. and Schlaifer, R. (1961). Applied Statistical Decision Theory 
(Harvard University Press, Cambridge, MA). 

Rattihalli, R. N. (1981). Regions of maximum probability content and their 
applications, Ph.D. Thesis, University of Poona, India. 

Rausch, W. and Horn, M. (1988). Applications and tabulations of the 
multivariate t distribution with p = 0, Biometrical Journal 30, 595-605. 

Rényi, A. (1959). On the dimension and entropy of probability distributions, 
Acta Mathematica Academiae Scientiarum Hungaricae 10, 193-215. 

Rényi, A. (1960). A few fundamental problems of information theory (in 
Hungarian), A Magyar Tudományos Akadémia Matematikai és Fizikai 
Tudományok Osztályának Közleményei 10, 251-282. 

Rényi, A. (1961). On measures of entropy and information, in Proceedings of 
the Fourth Berkeley Symposium on Mathematical Statistics and 
Probability I, pp. 547-561 (University of California Press, Berkeley). 

Robbins, H. (1948). The distribution of Student’s t when the population 
means are unequal, Annals of Mathematical Statistics 19, 406-410. 

Rousseeuw, P. J., Kaufman, L. and Trauwaert, E. (1996). Fuzzy clustering 
using scatter matrices, Computational Statistics and Data Analysis 23, 
135-151. 

Ruben, H. (1960). On the distribution of weighted difference of two 
independent Student variates, Journal of the Royal Statistical Society B 
22, 188-194. 


262 References 


Rubin, D. B. (1983). Iteratively reweighted least squares, in Encyclopedia of 
Statistical Sciences 4, ed. S. Kotz and N. L. Johnson, pp. 272-275 (John 
Wiley and Sons, New York). 

Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys (John 
Wiley and Sons, New York). 

Rubin, D. B. and Schafer, J. L. (1990). Efficiently creating multiple 
imputations for incomplete multivariate normal data, in Proceedings of 
the Statistical Computing Section of the American Statistical 
Assoctation, pp. 83-88 (American Statistical Association, Washington, 
DC). 

Sahu, S. K., Dey, D. K. and Branco, M. D. (2000). A new class of 
multivariate skew distributions with applications to Bayesian regression 
models, Research Report RT-MAE 2000-16, Department of Statistics, 
University of Sao Paulo, Sao Paulo, Brasil. 

Sarabia, J. M. (1995). The centered normal conditionals distribution, 
Communications in Statistics—Theory and Methods 24, 2889-2900. 

Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data (Chapman 
and Hall, London). 

Scott, A. (1967). A note on conservative confidence regions for the mean of a 
multivariate normal, Annals of Mathematical Statistics 38, 278-280. 
Correction: Annals of Mathematical Statistics 39, 1968, 2161. 

Seal, K. C. (1954). On a class of decision procedures for ranking means, 
Institute of Statistics Mimeograph Series No. 109, University of North 
Carolina at Chapel Hill. 

Seneta, E. (1993). Probability inequalities and Dunnett’s test, in Multiple 
Comparisons, Selection, and Applications in Biometry, pp. 29-45 
(Marcel Dekker, New York). 

Sepanski, S. J. (1994). Asymptotics for multivariate t-statistic and 
Hotelling’s T?-statistic under infinite second moments via 
bootstrapping, Journal of Multivariate Analysis 49, 41-54. 

Sepanski, S. J. (1996). Asymptotics for multivariate t-statistic for random 
vectors in the generalized domain of attraction of the multivariate 
normal law, Statistics and Probability Letters 30, 179-188. 

Shampine, L. F. and Allen, R. C. (1973). Numerical Computing: An 
Introduction (Saunders, Philadelphia). 

Shimizu, R. and Fujikoshi, Y. (1997). Sharp error bounds for asymptotic 
expansions of the distribution functions for scale mixtures, Annals of the 
Institute of Statistical Mathematics 49, 285-297. 

Šidák, Z. (1965). Rectangular confidence regions for means of multivariate 
normal distributions, Bulletin of the Institute of International Statistics 
41, 380-381. 

Šidák, Z. (1967). Rectangular confidence regions for the means of 
multivariate normal distributions, Journal of the American Statistical 
Association 62, 626-633. 

Šidák, Z. (1971). On probabilities of rectangles in multivariate Student 
distributions: their dependence and correlations, Annals of 
Mathematical Statistics 42, 169-175. 

Šidák, Z. (1973). A chain of inequalities for some types of multivariate 
distributions, with nine special cases, Applications of Mathematics 18, 
110-118. 

Siddiqui, M. M. (1967). A bivariate t distribution, Annals of Mathematical 


References 263 


Statistics 38, 162-166. 

Singh, R. K. (1988). Estimation of error variance in linear regression models 
with errors having multivariate student-t distribution with unknown 
degrees of freedom, Economics Letters 27, 47-53. 

Singh, R. K. (1991). James-Stein rule estimators in linear regression models 
with multivariate-t distributed error, Australian Journal of Statistics 33, 
145-158. 

Singh, R. K., Mistra, S. and Pandey, S. K. (1995). A generalized class of 
estimators in linear regression models with multivariate-¢ distributed 
error, Statistics and Probability Letters 23, 171-178. 

Siotani, M. (1959). The extreme value of the generalized distances of the 
individual points in the multivariate normal sample, Annals of the 
Institute of Statistical Mathematics 10, 183-208. 

Siotani, M. (1964). Interval estimation for linear combinations of means, 
Journal of the American Statistical Association 59, 1141-1164. 

Siotani, M. (1976). Conditional and stepwise multivariate t distributions, in 
Essays in Probability and Statistics, pp. 287-303 (Tokyo). 

Singh, R. S. (1991). James-Stein rule estimators in linear regression models 
with multivariate-t distributed error, Australian Journal of Statistics 33, 
145-158. 

Sloan, I. H. and Joe, S. (1994). Lattice Methods for Multiple Integration 
(Clarendon Press, Oxford). 

Smith, D. J., Bailey, T. C. and Munford, G. (1993). Robust classification of 
high-dimensional data using artificial neural networks, Statistics and 
Computing 3, 71-81. 

Somerville, P. N. (1993a). Simultaneous confidence intervals (General linear 
model), Bulletin of the International Statistical Institute 2, 427-428. 

Somerville, P. N. (1993b). Exact all-pairwise multiple comparisons for the 
general linear model, in Proceedings of the 25th Symposium on the 
Interface, Computing Science and Statistics, pp. 352-356 (Interface 
Foundation, Virginia) 

Somerville, P. N. (1993c). Simultaneous multiple orderings, Technical Report 
TR-93-1, Department of Statistics, University of Central Florida, 
Orlando. 

Somerville, P. N. (1994). Multiple comparisons, Technical Report TR-94-1, 
Department of Statistics, University of Central Florida, Orlando. 

Somerville, P. N. (1997). Multiple testing and simultaneous confidence 
intervals: Calculation of constants, Computational Statistics and Data 
Analysis 25, 217-233. 

Somerville, P. N. (1998a). A Fortran 90 program for evaluation of 
multivariate normal and multivariate-t integrals over convex regions, 
Journal of Statistical Software, 
http://www.stat.ucla.edu/journals/jss/v03/i04. 

Somerville, P. N. (1998b). Numerical computation of multivariate normal 
and multivariate t probabilities over convex regions, Journal of 
Computational and Graphical Statistics 7, 529-544. 

Somerville, P. N. (1999a). Numerical evaluation of multivariate integrals over 
ellipsoidal regions, Bulletin of the International Statistical Institute. 

Somerville, P. N. (1999b). Critical values for multiple testing and 
comparisons: one step and step down procedures, Journal of Statistical 
Planning and Inference 82, 129-138. 


264 References 


Somerville, P. N. (2001). Numerical computation of multivariate normal and 
multivariate t probabilities over ellipsoidal regions, Journal of Statistical 
Software, 
http://www.stat.ucla.edu/www.jstatsoft.org/v06 /i08. 

Somerville, P. N. and Bretz, F. (2001). Fortran 90 and SAS-IML programs 
for computation of critical values for multiple testing and simultaneous 
confidence intervals, Journal of Statistical Software, 
http://www.stat.ucla.edu/www.jstatsoft .org/v06/i05. 

Somerville, P. N., Miwa, T., Liu, W. and Hayter, A. (2001). Combining 
one-sided and two-sided confidence interval procedures for successive 
comparisons of ordered treatment effects, Biometrical Journal 43, 
533-542. 

Song, K.-S. (2001). Rényi information, loglikelihood and an intrinsic 
distribution measure, Journal of Statistical Planning and Inference 93, 
51-69. 

Spainer, J. and Oldham, K. B. (1987). An Atlas of Functions (Hemisphere 
Publishing Company, Washington, DC). 

Spurrier, J. D. and Isham, S. P. (1985). Exact simultaneous confidence 
intervals for pairwise comparisons of three normal means, Journal of the 
American Statistical Association 80, 438-442. 

Srivastava, M. S. and Awan, H. M. (1982). On the robustness of Hotelling’s 
T?-test and distribution of linear and quadratic forms in sampling from 
a mixture of two multivariate normal populations, Communications in 
Statistics—Theory and Methods 11, 81-107. 

Steffens, F. E. (1969a). A stepwise multivariate t distribution, South African 
Statistical Journal 3, 17-26. 

Steffens, F. E. (1969b). Critical values for bivariate Student t-tests, Journal 
of the American Statistical Association 64, 637-646. 

Steffens, F. E. (1970). Power of bivariate studentized maximum and 
minimum modulus tests, Journal of the American Statistical Association 
65, 1639-1644, 

Steffens, F. E. (1974). A bivariate t distribution which occurs in stepwise 
regression (abstract), Biometrics 30, 385. 

Steyn, H. S. (1993). One the problem of more than one kurtosis parameter in 
multivariate analysis, Journal of Multivariate Analysis 44, 1-22. 

Stone, M. (1964). Comments on a posterior distribution of Geisser and 
Cornfield, Journal of the Royal Statistical Society B 26, 274-276. 

Sukhatme, P. V. (1938). On Fisher and Behrens’ test of significance for the 
difference in means of two normal samples, Sankhya 4, 39-48. 

Sultan, S. A. and Tracy, D. S. (1996). Moments of the complex multivariate 
normal distribution. Special issue honoring Calyampudi Radhakrishna 
Rao, Linear Algebra and Its Applications 237/238, 191-204. 

Sun, L., Hsu, J. S. J., Guttman, I. and Leonard, T. (1996). Bayesian 
methods for variance component models, Journal of the American 
Statistical Association 91, 743-752. 

Sutradhar, B. C. (1986). On the characteristic function of the multivariate 
Student ¢-distribution, Canadian Journal of Statistics 14, 329-337. 
Sutradhar, B. C. (1988a). Author’s revision, Canadian Journal of Statistics 

16, 323. 

Sutradhar, B. C. (1988b). Testing linear hypothesis with £ error variable, 

Sankhya B 175-180. 


References 265 


Sutradhar, B. C. (1990). Discrimination of observations into one of two t 
populations, Biometrics 46, 827-835. 

Sutradhar, B. C. (1993). Score test for the covariance matrix of the elliptical 
t-distribution, Journal of Multivariate Analysis 46, 1~12. 

Sutradhar, B. C. and Ali, M. M. (1986). Estimation of the parameters of a 
regression model with a multivariate ¢ error variable, Communications 
in Statistics—Theory and Methods 15, 429-450. 

Sutradhar, B. C. and Ali, M. M. (1989). A generalization of the Wishart 
distribution for the elliptical model and its moments for the multivariate 
t model, Journal of Multivariate Analysis 29, 155-162. 

Sweeting, T. J. (1984). Approximate inference in location-scale regression 
models, Journal of the American Statistical Association 79, 847-852. 

Sweeting, T. J. (1987). Approximate Bayesian analysis of censored survival 
data, Biometrika 74, 809-816. 

Takano, K. (1994). On Bessel equations and the Lévy representation of the 
multivariate ¢ distribution, Technical Report, Department of 
Mathematics, Ibaraki University, Japan. 

Tan, W. Y. (1969a). Note on the multivariate and the generalized 
multivariate beta distributions, Journal of the American Statistical 
Association 64, 230-241. 

Tan, W. Y. (1969b). Some distribution theory associated with complex 
Gaussian distribution, Tamkang Journal 7, 263-302. 

Tan, W. Y. (1973). On the complex analogue of Bayesian estimation of a 
multivariate regression model, Annals of the Institute of Statistical 
Mathematics 25, 135-152. 

Tiao, G. C. and Zellner, A. (1964). On the Bayesian estimation of 
multivariate regression, Journal of the Royal Statistical Society B 26, 
277-285. 

Tierney, L. and Kadane, J. (1986). Accurate approximations for posterior 
moments and marginal densities, Journal of the American Statistical 
Association 81, 82-86. 

Tiku, M. L. (1967). Tables of the power of the F-test, Journal of the 
American Statistical Association 62, 525-539. 

Tiku, M. L. and Gill, P. S. (1989). Modified maximum likelihood estimators 
for the bivariate normal based on Type II censored samples, 
Communications in Statistics—Theory and Methods 18, 3505-3518. 

Tiku, M. L. and Kambo, N. S. (1992). Estimation and hypothesis testing for 
a new family of bivariate nonnormal distributions, Communications in 
Statistics—Theory and Methods 21, 1683-1705. 

Tiku, M. L. and Suresh, R. P. (1992). A new method of estimation for 
location and scale parameters, Journal of Statistical Planning and 
Inference 30, 281-292. 

Tong, Y. L. (1970). Some probability inequalities of multivariate normal and 
multivariate t, Journal of the American Statistical Association 65, 
1243-1247. 

Tong, Y. L. (1982). Rectangular and elliptical probability inequalities for 
Schur-concave random variables, Annals of Statistics 10, 637-642. 

Tranter, C. J. (1968). Bessel Functions with Some Physical Applications 
(English Universities Press Ltd., London). 

Trout, J. R. and Chow, B. (1972). Table of the percentage points of the 
trivariate ¢ distribution with an application to uniform confidence 


266 References 


bands, Technometrics 14, 855-879. 

Vaduva, I. (1985). Computer generation of random vectors based on 
transformation of uniformly distributed vectors, in Proceedings of the 
Seventh Conference on Probability Theory, ed. M. Iosifescu, pp. 589-598 
(NU Science Press, Utrecht). 

Vajda, I. (1989). Theory of Statistical Inference and Information (Kluwer 
Academic Publishers, Dordrecht). 

Vale, C. D. and Maurelli, V. A. (1983). Simulating multivariate nonnormal 
distributions, Psychometrika 48, 465-471. 

van Dijk, H. K. (1985). Existence conditions for posterior moments of 
simultaneous equation model parameters, Report 8551 of the 
Econometric Institute, Erasmus University, Rotterdam. 

van Dijk, H. K. (1986). A product of multivariate T densities as upper bound 
for the posterior kernel of simultaneous equation model parameters. 

Vijverberg, W. P. M. (1995). Monte Carlo evaluation of multivariate normal 
probabilities, Journal of Econometrics. 

Vijverberg, W. P. M. (1996). Monte Carlo evaluation of multivariate 
Student’s ¢ probabilities, Economics Letters 52, 1-6. 

Vijverberg, W. P. M. (1997). Monte Carlo evaluation of multivariate normal 
probabilities, Journal of Econometrics 76, 281-307. 

Vijverberg, W. P. M. (2000). Rectangular and wedge-shaped multivariate 
normal probabilities, Economics Letters 68, 13-20. 

Wald, A. (1944). On a statistical problem arising in the classification of an 
individual into one of two groups, Annals of Mathematical Statistics 15, 
145-162. 

Wallgren, C. M. (1980). The distribution of the product of two correlated t 
variates, Journal of the American Statistical Association 75, 996-1000. 

Walker, G. A. and Saw, J. G. (1978). The distribution of linear combinations 
of t variables, Journal of the American Statistical Association 73, 
876-878. 

Wang, O. and Kennedy, W. J. (1990). Comparison of algorithms for 
bivariate normal probability over a rectangle based on self-validating 
results from interval analysis, Journal of Statistical Computation and 
Simulation 37, 13-25. 

Wang, O. and Kennedy, W. J. (1997). Application of numerical interval 
analysis to obtain self-validating results for multivariate probabilities in 
a massively parallel environment, Statistics and Computing 7, 163-171. 

Watson, G. N. (1958). A Treatise on the Theory of Bessel Functions 
(Cambridge University Press, Cambridge). 

Whittaker, E. T. and Watson, G. N. (1952). Modern Analysis (Cambridge 
University Press, Cambridge). 

Weir, J. B. de V. (1966). Table of 0-1 percentage points of Behrens’s d, 
Biometrika 53, 267-268. 

Wooding, R. A. (1956). The multivariate distribution of complex normal 
variables, Biometrika 43, 212-215. 

Wu, C. F. J. (1983). On the convergence properties of the EM algorithm, 
Annals of Statistics 11, 95-103. 

Wynn, H. P. and Bloomfield, P. (1971). Simultaneous confidence bands for 
regression analysis (with discussion), Journal of the Royal Statistical 
Society B 33, 202-217. 

Yang, Z. Q. and Zhang, C. M. (1997). Dimension reduction and 


References 267 


[,-approximation for evaluations of multivariate normal integrals, 
Chinese Journal of Numerical Mathematics and Applications 19, 82-95. 

Zellner, A. (1971). An Introduction to Bayesian Inference in Econometrics 
(John Wiley and Sons, New York). 

Zellner, A. (1976). Bayesian and non-Bayesian analysis of the regression 
model with multivariate Student-t error terms, Journal of the American 
Statistical Association 71, 400-405. 

Zhuang, X., Huang, Y., Palaniappan, K. and Zhao, Y. (1996). Gaussian 
density mixture modeling, decomposition and applications, IEEE 
Transactions on Image Processing 5, 1293-1302. 

Zografos, K. (1999). On maximum entropy characterization of Pearson’s type 
II and type VII multivariate distributions, Journal of Multivariate 
Analysis 71, 67-75. 


Index 


F matrix, 20 
F-divergence, 242 
L?-divergence, 242 
t-ratio distribution, 60 
Fortran, 153, 154 


acceptance-rejection sampling, 157, 158 
Appell’s hypergeometric function, 32, 33 
Appell’s polynomials, 10 

association, 20 


Bartlett’s decomposition, 213 
Bayesian estimator, 159-161 
Bayesian inference, 2, 50, 112, 213, 214, 
233-236, 241, 246 

Behren-Fisher density, 32, 33 
Bessel function 

first kind, 42 

second kind, 42 
beta function, 78 
Bonferroni’s inequalities, 177 
BUGS, 226, 227 


characteristic function, 36-41, 43-45, 
48, 76, 120, 135, 200 
Cholesky decomposition, 151, 156 
cluster analysis, 52, 212, 244, 245 
complementary error function, 143, 144 
complete elliptical integral 
first kind, 48, 78, 79 
second kind, 48, 78 
concomitant order statistics, 207, 208 
cone, 145, 146 
confluent hypergeometric function, 47, 
71, 90 
consistency, 9 
convex polyhedra, 148, 150 
correlation structure 
decomposable, 171, 189, 190 


equi, 25, 34, 127, 134, 135, 143, 172, 
175, 178, 182, 183, 187 
intraclass, 112 
quasi-decomposable, 189 
singular, 189 
cumulant generating function, 121, 122, 
205 


decision theory, 12 
dependence coefficient, 25 
digamma function, 22, 23, 204, 218 
discriminant analysis, 244 
distribution 
asymmetric multivariate t, 97, 99 
bivariate t, 13, 15, 21, 56, 72, 74, 80, 
81, 106, 127, 148, 161, 170, 174, 
181, 183, 207 
bivariate Cauchy, 68, 80 
bivariate chi-squared, 71, 73 
bivariate normal, 15, 20, 53, 63, 66, 
68, 69, 71, 73-75, 80, 119, 129, 
148, 154, 170, 174, 209 
bivariate Pearson type VII, 80 
central bivariate t, 72, 127 
central matrix-variate t, 113-116 
central multivariate ¢, 1, 16, 17, 19, 
22, 29, 87, 88, 90, 99, 114-117, 
126, 131, 135, 140, 165, 170, 172, 
175, 182, 204, 205 
central univariate F, 20 
complex matrix-variate t, 120 
complex multivariate t, 119, 120 
complex multivariate normal, 119, 
120 
complex univariate normal, 120 
compound normal, 96 
conditionally specified bivariate t, 76 
Dirichlet, 84, 225 
doubly noncentrai bivariate t, 71 
doubly noncentral multivariate t, 93 


269 


270 


doubly noncentral univariate t, 71, 93 

doubly noncentral univariate 
chi-squared, 93 

Fisher’s z, 34 

gamma, 226, 239 

generalized t, 94, 96, 97, 101 

Gumbel, 106 

infinite matrix-variate t, 119 

inverse gamma, 94 

inverted Dirichlet, 126 

inverted gamma, 192, 196, 220, 237 

matrix-variate t, 97, 112-115, 118, 
119 

matrix-variate beta, 117 

matrix-variate inverse Wishart, 118 

matrix-variate normal, 118 

mixture of normal, 9 

mixture of two normal, 200 

multivariate F', 100, 110 

multivariate t, 1, 8, 16, 22, 24, 30, 36, 
38, 41, 44, 50, 87, 88, 94, 98, 102, 
103, 106, 113, 115, 117, 120, 126, 
127, 135, 140, 145, 151, 153, 154, 
159, 161, 165, 170, 172-174, 183, 
184, 187, 188, 191, 204, 210, 
213-215, 223, 225-228, 233-236, 
239, 241, 243-246 

multivariate Bessel, 9 

multivariate Cauchy, 1, 9, 22, 25, 40, 
41, 100, 192 

multivariate elliptical, 2, 94, 96, 100, 
120, 121, 123, 125, 126, 196, 200, 
201, 240, 243 

multivariate logistic, 9 

multivariate normal, 1, 2, 9, 12, 13, 
15, 18, 24, 31, 50, 52, 73, 87, 90, 
93-96, 98-100, 103, 112, 119, 
123, 125, 140, 152, 155, 156, 165, 
171, 181, 192, 196, 205, 213, 225, 
226, 233, 239, 245 

multivariate Pearson type II, 9 

multivariate Pearson type VII, 2, 9, 
40, 41 

multivariate uniform, 157, 158, 226 

multivariate Wishart, 7 

noncentral bivariate t, 63, 66, 68, 69, 
73 

noncentral complex Wishart, 197 

noncentral matrix-variate t, 113 

noncentral multivariate t, 1, 23, 87, 
90, 93, 139, 140 

noncentral multivariate Cauchy, 23 

noncentral univariate F', 19, 34 

noncentral univariate t, 63 

noncentral univariate chi-squared, 71, 
209, 239 

noncentral Wishart, 197 


Index 


power exponential, 9 

scale mixture of normal, 194, 239 

skewed multivariate t, 98, 100, 
102-105, 107, 109, 112 

skewed multivariate Cauchy, 100 

skewed multivariate elliptical, 101 

skewed multivariate normal, 98, 103 

skewed univariate t, 83, 84, 106, 107, 
109 

spherical, 9, 96 

standard normal, 49, 57, 63, 80, 82, 
98, 130, 132, 133, 136, 144, 166, 
175, 177, 181-184, 186, 224 

stepwise multivariate t, 90, 91 

Student’s t, 1, 13, 23, 28, 30, 36, 40, 
41, 49, 52, 53, 56, 63, 64, 66, 67, 
75, 79, 80, 82, 84, 92, 98, 99, 101, 
105, 111, 116, 129, 137, 145, 150, 
156, 161, 163, 167, 169, 170, 175, 
179, 185, 230, 246 

symmetric multivariate t, 111 

trivariate t, 21, 35, 183 

trivariate normal, 67, 121, 123 

univariate F', 107, 109, 116, 117, 151, 
152, 179, 199, 203, 230, 234, 238 

univariate beta, 83, 106, 164 

univariate Cauchy, 23, 56, 68 

univariate chi-squared, 2, 21, 53, 63, 
70, 80, 82, 83, 87, 90, 91, 98, 100, 
103, 108, 120, 131, 132, 135, 152, 
165, 181, 196, 200-202, 209, 216, 
234, 239, 245 

univariate Poisson, 93 

univariate uniform, 226 

Wishart, 193, 213 


econometrics, 56, 114 

Edgeworth form, 133 

EM algorithm, 210-212, 235, 239 
entropy, 21-23, 204 

entropy loss function, 219, 220 
error function, 141 
exchangeability, 172 

exponential family, 96 


factor analysis, 193, 215 

fiducial distribution, 45 

Fisher’s optimal discrimination 
criterion, 244 

forecast error, 13 

forecasting, 12, 15 

Fortran, 158, 169 

Fourier series expansion, 242 

Fourier transform, 140 


Gauss hypergeometric function, 46, 67, 
78, 81, 137 


Index 


Gauss-Hermite quadrature, 134, 143 
Gauss-Legendre quadrature, 152 
general hypergeometric function, 113 
generalized gamma function, 113, 193 
Gram-Charlier expansion, 133 


Hall’s index, 242 

Hartley’s differential-difference 
equation, 133 

Hermite polynomial, 132, 136, 176 

hierarchical models, 214 

Hotelling’s T? statistic, 199, 202, 203, 
209, 210 


importance sampling, 161, 163, 164 
imputation 

multiple, 212, 213 

single, 212 
incomplete beta function, 62, 128, 139, 

148, 149 

incomplete beta function ratio, 139, 232 
incomplete gamma function ratio, 47 
infinitely divisible, 41, 43 
information matrix, 217-219, 235 
inversion theorem, 135 


Kullback-Leibler number, 24, 204, 205 
kurtosis, 28, 121-123, 125, 126, 200, 
223, 224 


Lévy representation, 41, 43 
Laguerre polynomial, 177 
Laplacian T-approximation, 214 
Laplacian approximation, 214 
lattice rule algorithm, 157, 158 
least absolute deviation estimator, 243 
linear inequalities, 158 
linear model 
Bayesian, 102, 233 
classical, 228, 239 
general, 237 
indexed, 235 
linear regression, 213 
multivariate, 213 
linear simultaneous equation model, 114 
load, 158 
logit, 164 


MacDonald function, 36, 38, 40, 42 
macroeconomic modeling, 114 
Mahalanobis distance, 210 
MANOVA, 239 
maximum entropy 
characterization, 23 
distribution, 23 
maximum probability content, 160 
microeconomic modeling, 114 


271 


missing data imputation, 212 

moment generating function, 76, 
121-123, 125, 198, 205 

monotone data augmentation, 213 

Monte Carlo algorithm, 152, 157 

Monte Carlo simulation, 161, 162 

multiple comparison procedure, 154, 158 

multiple correlation coefficient, 91 

multiple decision problem, 245 

multiple regression analysis, 91 

multiple stationary normal time series, 
120 

multivariate £ model, 191-194, 219, 221, 
222 

generalized, 196, 222 

multivariate t statistic, 198, 199 

multivariate skewness, 102 

multivariate tail-dependence, 243 

mutual information, 24, 25 


nonlinear model, 239, 240 

nonlinear regression, 235, 239 
multivariate, 235 

normal law, 198, 199 


online environmental learning, 246 
order statistics, 186, 207, 208 
orthant symmetry, 202 

orthogonal expansion index, 242 
orthogonally invariant, 97, 117 


parabolic cylinder function, 90, 144 
partial regression coefficient, 216 
percentage points, 57, 59, 174-178, 
180-185, 187-189 

Poisson mixture, 93 
portfolio optimization, 243 
predictive distribution, 114, 232, 233 
prior 

diffuse, 50, 52, 114, 234 

improper, 240 

natural conjugate, 52, 114, 234 

proper, 240 
probability inequalities, 165, 169-172 
projection index, 241, 242 
projection pursuit, 241 


quadrant dependence, 20 

quadratic form, 19, 20, 34, 50, 96, 100, 
104, 115, 125, 197, 198, 218 

quadratic loss function, 194, 222 

quadrature formulas, 137 


Rényi distances, 28 

Rényi information, 26-28 
randomized block design, 178 
ranking and selection, 56 


272 Index 


rejection algorithm, 226 
reliability, 152 
robust estimation, 235 


Schur-concave, 2, 172 

score function, 215, 217 

score test, 215, 216 

security returns, 246 

Shannon entropy, 27 

Simpson’s rule, 169 

skewness, 102, 121, 217, 223, 224 

slope coefficient, 243 

spectral decomposition, 195, 221 

spectral density matrix, 120 

speech recognition, 246 

squared error loss function, 221 

standardized cumulant, 133 

StatLib, 212 

Stein loss function, 219 

stock market problems, 192 

strength, 158 

Student’s t index, 242 

Studentized maximum and minimum 
modulus tests, 139 

Studentized statistics, 134 


Taylor series expansion, 35, 230 
multivariate, 154 

triangular decomposition, 195 

trigamma function, 235 

Tukey’s procedure, 154 


variance Component model, 214 


Wishart matrix, 191-193, 196, 204, 219, 
221, 222 


